Coronavirus COVID-19 Cases in Finland

Bernardo Di Chiara, Data Analyst

http://fi.linkedin.com/in/bernardodichiara

Last plotted day: see the end of this file

Last full revision of the comments: June 16th 2020. (Single sections, such as section 9. Conclusion, are updated frequently.)

Table of Contents

1. Executive Summary
....1.1. References
2. Setup
3. Defining the Needed Functions
....3.1. Dataframes and Lists Handling
....3.2. Plots
....3.3. Project-specific Functions
4. Dumping and Collecting the Data
5. Data Analysis
....5.1. Summary
....5.2. Preliminary Data Analysis
....5.3. Data Cleansing
....5.4. Data Preparation
............5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries
............5.4.2. Population age data
............5.4.3. World Data
............5.4.4. Finnish Data
............5.4.5. Data from other Scandinavian Countries and Estonia
............5.4.6. Data from other European Countries
............5.4.7. Data from UK and US
............5.4.8. Data from Brazil, Russia and India
............5.4.9. Data from China
....5.5. Summary of the Created Datasets
6. Domain-Specific Concepts
7. Data Visualization
....7.1. Overview
............7.1.1. General Comments to the Plots
............7.1.2. A Reference Curve Set
....7.2. Finnish Internal Situation
....7.3. Comparison with the Closest Neighboring Countries
............7.3.1. Comparison with Other Scandinavian Countries and Estonia
....7.4. Comparison with other European Countries
....7.5. Situation in China
....7.6. Situation in Italy
....7.7. UK and US
....7.8. Brazil, Russia and India
....7.9. Normalizing by Country population
............7.9.1. List of Variables Affecting Potentially the Curves
............7.9.2. Confirmed Cases: Summary of Findings from the Analysis
............7.9.3. Deceased Cases: Summary of Findings from the Analysis
....7.10. Demographic Considerations
....7.11. World View
............7.11.1. Lethality
8. Statistics
....8.1. World view
....8.2. Top Ten Countries
....8.3. Finland
9. Conclusions
10. Acknowledgements

1. Executive Summary

This notebook contains visualizations related to the spread of the Coronavirus COVID-19 with a focus on Finland.

The data is taken from the Johns Hokpins University (JHU) /1/.

There are a few good dashboards in the Web about this topic (for example, by Johns Hokpins University /2/ and by Tableau /3/). In addition, there is a good site with latest information about Finland broken down by Region /4/. Another very useful source of information is the European Centre for Disease Prevention and Control /5/. Still, it might be beneficial to manipulate the data in order, for example, to compare Finnish curves with curves from other Countries.

Having updated charts is very useful both for authorities and for the population in order to make fact-based decisions that help to contain the positive cases so not to overload the hospitals and therefore minimizing the casualties.

Comparing Finnish curves to those of neighboring Countries might provide useful insights since, in addition to the geographical proximity and similar weather, those Countries have certain similarities in culture, behavior patterns and may be genetics.

Sections from 2 to 5 contain mostly code which is needed to define the used functions and to dump, cleanse and prepare the data.

General domain specific concepts are contained in section 6. An overview chapter containing a description of the plots and the illustration of a reference case is contained at the beginning of section 7.

Line plots containing confirmed cases each day as well as recovered and deceased cases have been produced. The active cases have been shown in the same plot.

Other plots containing the new confirmed daily cases, which shows the speed at which the virus is spreading, have been added as well. Daily increments have been plotted also for the deceased and the active cases.

Finnish curves have been compared to the curves of the other Scandinavian Countries as well as few other European Countries. Curves of UK, US, Brazil, Russia and India have been plotted as well.

Plots showing the number of confirmed cases per capita have been created to eliminate the population variable from the comparisons. Other plots have been created to normalize by the density of the population.

Finally, plots with worldwide data have been produced. This includes also a couple of plots that try to put the number of deceased cases into context.

Bar plots containing data of the most affected Countries have been added.

Due to the criticality of this information, no recommendations are included in this paper. Currently, Doctors and Authorities are the best sources for such recommendations.

If you are not interested in the code, go to section 6 and onward and focus on the plots, the tables and the plain text.

DISCLAIMER:

  • The code has not been peer-reviewed. If someone is wishing to do it, please contact the author.
  • The data related to the last day might be incomplete.
  • See also the legal disclaimer.

The spread of virus follows the rules of mathematics and statistics (Dr. Katharina Hauck, https://www.imperial.ac.uk/people/k.hauck).

1.1. References

/1/ [GitHub Repository by Johns Hokpins University](https://github.com/CSSEGISandData/COVID-19)
https://github.com/CSSEGISandData/COVID-19

/2/ [Dashboard by Johns Hokpins University with world-wide view](https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)
https://www.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6

/3/ Dashboard by Tableau with both global and Country-specific data
https://public.tableau.com/profile/covid.19.data.resource.hub#!/vizhome/COVID-19Cases_15840488375320/COVID-19Cases

/4/ Latest news about Finland broken by Region
https://finland-coronavirus-map.netlify.com/

/5/ European Centre for Disease Prevention and Control
https://www.ecdc.europa.eu/en/novel-coronavirus-china

/6/ Coursera: Let's Talk About COVID-19
https://www.coursera.org/learn/covid-19/home/welcome

2. Setup

In [1]:
# Importing the needed packages
import os
import datetime as dt
import regex as re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Displaying all the dafaframe columns and rows
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Setting a time stamp
start_time = dt.datetime.utcnow()

3. Defining the Needed Functions

3.1. Dataframes and Lists Handling

In [2]:
def df_basic_data(dfname):
    '''
    This function prints basic information about a given dataframe.
    The function needs as input parameters the dataframe name.
    '''

    import pandas as pd

    # Fetching the dataframe name
    name = [x for x in globals() if globals()[x] is dfname][0]
    print("Dataframe name:", name, "\n")
    print("Dataframe length:", len(dfname), "\n")
    print("Number of columns:", len(dfname.columns), "\n")
    # Columns data types
    data_types = dfname.dtypes
    # Distint values
    distint_values = dfname.apply(pd.Series.nunique)
    # Amount of null values
    null_values = dfname.isnull().sum()
    print("Dataframe's columns names, column data types, amount of distint "
          "(non null) values\n"
          "and amount of null values for each column:")
    df_index = ['Data_Type',
                'Amount_of_Distint_Values',
                'Amount_of_Null_Values']
    col_types_dist_null = pd.DataFrame([data_types,
                                        distint_values,
                                        null_values],
                                       index=df_index)
    return col_types_dist_null.transpose()
In [3]:
def calc_increments(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    returns the result in a new list having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    increments = []
    # Adding zero to the first element
    increments.append(0.0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Adding the result to the list
        increments.append(delta)
    # Returning the result
    return increments
In [4]:
def find_neg_increm(listname):
    '''
    This function:
    takes a list,
    calculates the delta between each element and its predecessor,
    checks if the increment is negative and
    returns the result in a new list with boolean values having the same lenght as the original list
    '''

    # Initializing an empty list of floats to contain the increments
    neg_increments = []
    # Adding zero to the first element
    neg_increments.append(0)
    # Looping through all the occurrencies except the first one
    for i in list(range(1, len(listname))):
        # Calculating the increment
        delta = listname[i]-listname[i-1]
        # Checking if the increment is negative
        if delta < 0:
            neg_increments.append(1)
        else:
            neg_increments.append(0)
    # Returning the result
    return neg_increments

3.2. Plots

In [5]:
def cust_line_plot(*parameters,
                   figsize_w=8, figsize_h=6,
                   title=None,
                   title_fs=16, title_offset=20,
                   rem_borders=False,
                   label_fs=12, tick_fs=6, 
                   x_label=None,
                   vis_xticks=7,
                   rot=0,
                   y_label=None,
                   legend=False, leg_fs=10, legend_loc=0,
                   first_line_x=None, first_line_col=7,
                   first_line_ls=':', first_line_x_l=None,
                   second_line_x=None, second_line_col=7,
                   second_line_ls='--', second_line_x_l=None,
                   third_line_x=None, third_line_col=7,
                   third_line_ls='-.', third_line_x_l=None,
                   fourth_line_x=None, fourth_line_col=7,
                   fourth_line_ls='-', fourth_line_x_l=None,
                   fifth_line_x=None, fifth_line_col=8,
                   fifth_line_ls=':', fifth_line_x_l=None,
                   sixth_line_x=None, sixth_line_col=8,
                   sixth_line_ls='--', sixth_line_x_l=None,
                   seventh_line_x=None, seventh_line_col=8,
                   seventh_line_ls=':', seventh_line_x_l=None,
                   eighth_line_x=None, eighth_line_col=8,
                   eighth_line_ls='-', eighth_line_x_l=None):
    """
    This function plots a scatterplot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 5 elements containing:
                        a list with the x values,
                        a list with the y values,
                        a string containing the selected marker,
                        a string containing the selected line style,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    vis_xticks       -- After how many ticks to show the next tick label
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    ...
    eighth_line_x    -- x coordinates of vertical lines
    first_line_col
    ...
    eighth_line_col -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    for param in parameters:
        # Extracting the values given in parameters
        x = param[0]
        y = param[1]
        mark = param[2]
        ls = param[3]
        col_numb = param[4]
        leg_text = param[5]
        # Appending the string to the list
        leg_text_l.append(leg_text)
        # Creating the scatter plots
        plot = plt.plot(x, y, marker=mark, linestyle=ls, color=color_list[col_numb])
    
    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)
    
    # Showing only every nth day in the x axis
    axses = plt.gca() # Get the current axses
    days = axses.xaxis.get_ticklabels() # Get x axis tick abels
    # List of all elements except every nth
    days = list(set(days) - set(days[::vis_xticks]))
    # Omitting all the tick labels except every nth
    for label in days:
        label.set_visible(False)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
    
    # Adding a legend
    if legend:
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [6]:
def cust_bar_plot(parameters,
                  figsize_w=8, figsize_h=6,
                  title=None, title_fs=16, title_offset=20,
                  rem_borders=False,
                  label_fs=12, tick_fs=6,
                  x_label=None,
                  vis_xticks=7,
                  rot=0,
                  y_label=None,
                  legend=False,
                  leg_fs=10,
                  legend_loc=0,
                  first_line_x=None, first_line_col=7,
                  first_line_ls=':', first_line_x_l=None,
                  second_line_x=None, second_line_col=7,
                  second_line_ls='--', second_line_x_l=None,
                  third_line_x=None, third_line_col=7,
                  third_line_ls='-.', third_line_x_l=None,
                  fourth_line_x=None, fourth_line_col=7,
                  fourth_line_ls='-', fourth_line_x_l=None,
                  fifth_line_x=None, fifth_line_col=8,
                  fifth_line_ls=':', fifth_line_x_l=None,
                  sixth_line_x=None, sixth_line_col=8,
                  sixth_line_ls='--', sixth_line_x_l=None,
                  seventh_line_x=None, seventh_line_col=8,
                  seventh_line_ls=':', seventh_line_x_l=None,
                  eighth_line_x=None, eighth_line_col=8,
                  eighth_line_ls='-', eighth_line_x_l=None,                  
                  first_line_y=None, first_line_y_l=None,
                  second_line_y=None, second_line_y_l=None,
                  third_line_y=None, third_line_y_l=None,
                  fourth_line_y=None, fourth_line_y_l=None):
    """
    This function plots a bar plot for the provided data
    and customizes the way the chart looks by using the value of
    the provided parameters.

    Keyword arguments:
    parameters       -- A (mandatory) tuple of 4 elements containing:
                        a list with the x values,
                        a list with the y values,
                        an integer (from 0 to 9) selecting the seaborn-deep
                        color,
                        a string containing the text for the legend
    figsize_w        -- The width of the plot area
    figsize_w        -- The height of the plot area
    title            -- A string containing the title of the chart
    title_fs         -- The title font size
    title_offset     -- Distance between the title and the top of the chart
    rem_borders      -- If True the top and right borders are removed
                        (default: False)
    label_fs         -- x and y axis labels' font size
    tick_fs          -- The tick values font size
    x_label          -- Label for the x-axis (string)
    vis_xticks       -- After how many ticks to show the next tick label
    rot              -- The rotation angle of the tick values
    y_label          -- Label for the y-axis (string)
    legend           -- A boolean variable that tells if to plot a legend
    leg_fs           -- Font size for the legend
    legend_loc       -- An integer from 0 to 9 controlling the legend location
    first_line_x
    second_line_x
    third_line_x     
    fourth_line_x
    fifth_line_x     
    sixth_line_x     -- x coordinates of vertical lines
    first_line_col
    second_line_col
    third_line_col     
    fourth_line_col
    fifth_line_col
    sixth_line_col   -- an integer (from 0 to 9) selecting the seaborn-deep
                        color of the corresponding line    
    first_line_x_l
    ...
    eighth_line_x_l   -- legend text for the corresponding lines
    first_line_y
    second_line_y
    third_line_y     
    fourth_line_y    -- y coordinates of horizontal lines
    first_line_y_l
    second_line_y_l
    third_line_y_l   
    fourth_line_y_l  -- legend text for the corresponding lines
    """

    import matplotlib.pyplot as plt
    import seaborn as sns

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=title_offset)
    # Removing the top and right borders if so defined
    if rem_borders is True:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # Initializing an empy list to contain the legend text
    leg_text_l = []
    # Extracting the values given in parameters
    x = parameters[0]
    y = parameters[1]
    col_numb = parameters[2]
    leg_text = parameters[3]

    # Creating the bar plot
    plot = plt.bar(x, y, color=color_list[col_numb])

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    plt.xticks(fontsize=tick_fs, rotation=rot)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)
    plt.yticks(fontsize=tick_fs)
    
    # Showing only every nth day in the x axis
    axses = plt.gca() # Get the current axses
    days = axses.xaxis.get_ticklabels() # Get x axis tick abels
    # List of all elements except every nth
    days = list(set(days) - set(days[::vis_xticks]))
    # Omitting all the tick labels except every nth
    for label in days:
        label.set_visible(False)

    # Adding vertical lines
    if first_line_x:
        plt.axvline(x=first_line_x, color=color_list[first_line_col],
                    linestyle=first_line_ls)
        leg_text_l.append(first_line_x_l)
    if second_line_x:
        plt.axvline(x=second_line_x, color=color_list[second_line_col],
                    linestyle=second_line_ls)
        leg_text_l.append(second_line_x_l)
    if third_line_x:        
        plt.axvline(x=third_line_x, color=color_list[third_line_col],
                    linestyle=third_line_ls)
        leg_text_l.append(third_line_x_l)
    if fourth_line_x:        
        plt.axvline(x=fourth_line_x, color=color_list[fourth_line_col],
                    linestyle=fourth_line_ls)
        leg_text_l.append(fourth_line_x_l)
    if fifth_line_x:        
        plt.axvline(x=fifth_line_x, color=color_list[fifth_line_col],
                    linestyle=fifth_line_ls)
        leg_text_l.append(fifth_line_x_l)
    if sixth_line_x:        
        plt.axvline(x=sixth_line_x, color=color_list[sixth_line_col], 
                    linestyle=sixth_line_ls)
        leg_text_l.append(sixth_line_x_l)
    if seventh_line_x:        
        plt.axvline(x=seventh_line_x, color=color_list[seventh_line_col], 
                    linestyle=seventh_line_ls)
        leg_text_l.append(seventh_line_x_l)
    if eighth_line_x:        
        plt.axvline(x=eighth_line_x, color=color_list[eighth_line_col], 
                    linestyle=eighth_line_ls)
        leg_text_l.append(eighth_line_x_l) 
        
    # Adding horizontal lines
    if first_line_y:
        plt.axhline(y=first_line_y, color='grey', linestyle=':')
        leg_text_l.append(first_line_y_l)
    if second_line_y:
        plt.axhline(y=second_line_y, color='grey', linestyle='--')
        leg_text_l.append(second_line_y_l)
    if third_line_y:
        plt.axhline(y=third_line_y, color='grey', linestyle='-.')
        leg_text_l.append(third_line_y_l)
    if fourth_line_y:
        plt.axhline(y=fourth_line_y, color='grey', linestyle='-.')
        leg_text_l.append(fourth_line_y_l)

    # Adding a legend
    if legend:
        leg_text_l.append(leg_text)
        plt.legend(labels=leg_text_l, fontsize=leg_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Showing the plot without additional text
    plt.show()
In [7]:
def plot_stacked_bar(x, data, series_labels, col,
                     multidim=True, figsize_w=8, figsize_h=6,
                     title=None, title_fs=16,
                     frame=True,
                     category_labels=None,
                     label_fs=12, ticks_fs=12,
                     x_label=None, vis_xticks=7, rot=0,
                     y_label=None,
                     legend=True, legend_loc=0, legend_fs=10,
                     add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10):
    """
    This function plots a stacked bar chart with the provided data and
    labels.

    Keyword arguments:
    x               -- A list containing the x values (mandatory)
    data            -- A list of lists where each internal list contains
                       data of a series (mandatory)
    series_labels   -- List of series labels (strings) (these appear in
                       the legend) (mandatory)
    col             -- A list of integers controlling the colors of the series
                       (mandatory)
    multidim        -- Defines if data is multidimensional (default is True)
    figsize_w       -- The width of the plot area
    figsize_w       -- The height of the plot area
    title           -- A string containing the title of the chart
    title_fs        -- The title font size
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis
    category_labels -- List of category labels (strings) (these appear
                       on the x-axis)
    label_fs        -- x and y axis labels' font size
    tick_fs         -- The tick values font size
    vis_xticks       -- After how many ticks to show the next tick label
    rot             -- The rotation of the x axisis label (numerical)
                       (the default is horizontal)
    y_label         -- Label for the y-axis (string)
    legend          -- If true it shows a legend
    legend_loc      -- Used to position the legend compared to the centre
                       of the plot
    legend_fs       -- Legend font size
    add_text        -- Additional text to be shown in a box (string)
    addtext_x       -- Used to position the additional text box
    addtext_y       -- Used to position the additional text box
    addtext_fs      -- Font size of the additional text
    """

    # Finding the number of categories
    if multidim:
        cat_number = len(data[0])
    else:
        cat_number = len(data)

    # Preparing the indexes for the x axis
    ind = list(range(cat_number))
    # Initializing a list
    axes = []
    # Defining a numpy array containing the y coordinates of the bars
    # (the bars of the first series are on the x axis)
    bar_base = np.zeros(cat_number)
    # Converting the list with the data into a numpy array
    data = np.array(data)

    # Creating a new figure
    plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')
    # Adding a title (with some distance to the top of the plot)
    plt.title(title, fontsize=title_fs, pad=20)
    # Removing the frame and y axis ticks and values if so defined
    if frame is False:
        sns.despine(top=True, right=True, left=False, bottom=False)

    # If category labes are provided, showing them on the x axis
    if category_labels:
        plt.xticks(ind, category_labels, fontsize=ticks_fs, rotation=rot)

    # If a label for the x axis is provided, showing it on the x axis
    if x_label:
        plt.xlabel(x_label, fontsize=label_fs)
    # If a label for the y axis is provided, showing it on the y axis
    if y_label:
        plt.ylabel(y_label, fontsize=label_fs)

    # Showing only every nth day in the x axis
    axses = plt.gca() # Get the current axses
    days = axses.xaxis.get_ticklabels() # Get x axis tick abels
    # List of all elements except every nth
    days = list(set(days) - set(days[::vis_xticks]))
    # Omitting all the tick labels except every nth
    for label in days:
        label.set_visible(False)
    
    if multidim:
        # Iterating through the dimensions of the array
        for i, row_data in enumerate(data):
            # Creating the bars
            axes.append(plt.bar(x, row_data, bottom=bar_base,
                                color=color_list[col[i]],
                                label=series_labels[i]))
            # Incrementing the bar base height for the next series
            # by the height of the bar of the previous series
            bar_base += row_data
    else:
        # Creating the bars
        axes.append(plt.bar(x, data))

    # Creating a legend
    if legend:
        plt.legend(fontsize=legend_fs, loc=legend_loc,
                   facecolor="white", framealpha=1)

    # Adding a text box with additional information
    if add_text:
        box_style = dict(facecolor='white')
        plt.gcf().text(addtext_x, addtext_y,
                       add_text,
                       fontsize=addtext_fs, bbox=box_style)

    # Showing the plot without additional text
    plt.show()
In [8]:
def plot_cust_hbar(data,
                   figsize_w=8, figsize_h=6,
                   frame=True, grid=False,
                   ref_font_size=12,
                   title_text=None,
                   title_offset=20,
                   color_numb=0,
                   categ_labels=True,
                   labels=None,
                   rot=0,
                   show_values=False,
                   omitted_value=0,
                   percent=False,
                   center_al=True,
                   visible_digits=2):
    """
    This function plots a horizontal bar charts for the provided data with
    the provided labels and settings.

    Keyword arguments:
    data            -- A sorted Series that contains categorical data
                       (mandatory)
    figsize_w       -- The width of the plot area
    figsize_h       -- The height of the plot area
    frame           -- If False, the figure frame is omitted as well as
                       ticks and labels on the y axis (default is True)
    grid            -- If True a horizontal grid is displayed. It works
                       only when frame=True (default is False)
    ref_font_size   -- Reference font size used for all the fonts
    title_text      -- A string containing the title of the chart
    title_offset    -- The offset of the title from the rest of the plot
    color_numb      -- An integer between 0 and 9 that indicated the
                       seaborn-deep color to be used for the bars
    categ_labels    -- A boolean variable that defines if category labels
                       shall appear (on the y-axis)
    labels          -- List of category labels (strings) used only if
                       categ_labels=True.
                       They override the existing labels
    rot             -- The rotation of the x axsis label (numerical)
                       (the default is horizontal)
    show_values     -- If True, then numeric value labels will be shown on
                       each bar (default is False)
    omitted_value   -- The max value that shall not be shown in the bar
    percent         -- If true, it indicates that the values are in percentage
                       (default is False)
    center_al       -- A boolean variable that defines if the values shall be
                       written in the centre of the bar (default is True)
    visible_digits  -- Integer defining the number of decimal digits
                       to be seen in the value labels (the default is 2)
    """

    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns

    # Defining the suffix to be shown in the bar values
    if percent:
        p = '%'
    else:
        p = ""

    # Preparing the indexes for the x axis
    ind = list(range(len(data)))

    # Creating a new figure
    fig = plt.figure(figsize=(figsize_w, figsize_h))
    # Defining the used style
    color_list = sns.color_palette(palette='deep')

    # Removing y axis ticks
    plt.gca().yaxis.set_ticks_position('none')

    if frame is False:
        # Removing the borders, if so defined
        sns.despine(top=True, right=True, left=True, bottom=True)
        # Removing ticks and values in the x axes
        plt.gca().axes.get_xaxis().set_visible(False)
    elif grid:
        # Showing a vertical grig, if so defined
        plt.gca().xaxis.grid(color='grey', alpha=0.25,
                             linestyle='-', linewidth=1)

    # Adding a title (with some distance to the top of the plot)
    plt.title(title_text, fontsize=ref_font_size*1.33,
              loc='center', pad=title_offset)

    # Creating the bar plot
    plot = plt.barh(ind, data, color=color_list[color_numb])

    # Showing category labels on the y axes, if so defined
    if categ_labels:
        # Overriding the index value if category labels are provided
        if labels:
            plt.yticks(ind, labels, fontsize=ref_font_size, rotation=rot)
        else:
            plt.yticks(ind, data.index.tolist(),
                       fontsize=ref_font_size, rotation=rot)
    else:
        # Removing ticks and values in the y axes
        plt.gca().axes.get_yaxis().set_visible(False)

    # Showing the bar values, if so defined
    if show_values:
        # Iterating through the bars in the plot
        for bar in plot:
            # Getting bar height and width
            w, h = bar.get_width(), bar.get_height()
            # Printing the values only if they are bigger than the defined value
            if w > omitted_value:
                if center_al is True:
                    # Positioning the text in the centre of the bar horizontally
                    # and vertically
                    plt.text(bar.get_x() + w/2, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size, color="white",
                             ha="center", va="center")
                else:
                    # Positioning the text at the right of the bar horizontally
                    # and in the centre vertically
                    plt.text(bar.get_x() + w, bar.get_y() + h/2,
                             "{}".format(round(w, visible_digits))+p,
                             fontsize=ref_font_size,
                             ha="left", va="center")

    # Showing the plot without additional text
    plt.show()

3.3. Project-specific Functions

In [9]:
def find_last_day():
    '''
    This function reads in a certain directory to find the latest CSV file
    and returns the date of the last file in a string in the format mm-dd-yyyy
    '''

    # Getting the list of files in the daily reports folder
    for roots, dirs, files in os.walk('JHU_COVID-19/COVID-19/'
                                      'csse_covid_19_data/'
                                      'csse_covid_19_daily_reports'):
        file_list = files  # list of strings
        # Initializing a new list
        dates = []
        # Iterating through the original list
        for i in list(range(len(file_list))):
            file = file_list[i]
            # If is it a csv file ...
            if re.search("\S+[csv]", file):
                # Extracting the date into a list of string
                date = re.findall("[0-9]+[-][0-9]+[-][0-9]+", file)
                # Converting the format from string to date
                dt_date = dt.datetime.strptime(date[0], "%m-%d-%Y")
                # Appending the date to a list of dates (the new list)
                dates.append(dt_date)
    # Sorting the dates and taking the last one
    dates.sort(reverse=True)
    latest = dates[0]  # datetime
    # Converting the latest date to a string
    last_day = latest.strftime("%m-%d-%Y")

    return last_day
In [10]:
def extract_country(Country, State=None, days=0):
    '''
    This function allows selecting data related to a specific Country
    from the datasets produced by JHU.
    It takes the following input:
    - a string containing the Country name written with the first letter
    as a capital letter (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (optional)
    - an integer containing how many days to skip (default = 0)
    It returns a tuple of 3 lists containing data related to confirmed
    recovered and deceased cases.
    '''

    # Extracting confirmed cases
    if State:
        confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country) &
                                   (world_conf_clean['Province/State'] == State)]
    else:
        confirm = world_conf_clean[(world_conf_clean['Country/Region'] == Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    confirm = confirm.iloc[:, 4+days:]
    # Copying the result into a list
    confirm_l = confirm.values.tolist()[0]
    
    # Extracting recovered cases
    if State:
        recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country) &
                                  (world_recov_clean['Province/State'] == State)]
    else:
        recov = world_recov_clean[(world_recov_clean['Country/Region'] == Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    recov = recov.iloc[:, 4+days:]
    # Copying the result into a list
    recov_l = recov.values.tolist()[0]

    # Extracting deceased cases
    if State:
        deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
                                     Country) &
                                    (world_deceas_clean['Province/State'] == State)]
    else:
        deceas = world_deceas_clean[(world_deceas_clean['Country/Region'] ==
                                     Country)]
    # Extracting the columns containing the data for each day
    # by skipping a number of days equal to days
    deceas = deceas.iloc[:, 4+days:]
    # Copying the result into a list
    deceas_l = deceas.values.tolist()[0]

    return confirm_l, recov_l, deceas_l
In [11]:
def extract_non_null(input_list):
    '''
    This function takes as input a list that contains a certain number of
    zero values, omits such values and returns what is left in a new list.
    '''

    # Initializing a list
    no_null = []
    # Looping through all the elements of the list
    for i in list(range(len(input_list))):
        if input_list[i] != 0:
            # Extracting non null values
            no_null.append(input_list[i])

    return no_null
In [12]:
def pop_perc(values, pop):
    '''
    This function takes the following inputs:
    - a list of floats in units
    - a float in million of units

    The function calculates the percentage values of the values in the list
    compared to the value in the single float multiplied one million times.
    The function is useful, for example, to calculate the number of
    confirmed Coronavirus cases pro capita
    (in percentage of the total pupulation in millions).

    The function retunts a list of floats.
    '''

    result = (pd.Series(values)/(pop*1000000))*100

    return result
In [13]:
def prep_country_data(Country, pop, State=None, days=0):
    '''
    This function allows to prepare the data for a specific Country.

    It takes the following inputs:

    - a string variable that contains the name of the Country
    written with the first letter as a capital letter (mandatory)
    - a float that contains the Country population in millions (mandatory)
    - a string containing the State name written with the first letter
    as a capital letter (optional)
    - an integer that tells the number of initial days in the time series
    to skip (default = 0)

    The function uses the following functions:

    - 'extract_country' to extract Country-specific information from
    the relevant dataframes
    - 'calc_increments' to calculate the daily increments in a time series
    - 'extract_non_null' to extract only the non null values of a time series
    - 'pop_perc' to calculate the values of a series in percentage of the Country population

    The output is a tuple with the following content:

    - a list containing a time series with the cumulative confirmed cases
    - a list containing a time series with the cumulative recovered cases
    - a list containing a time series with the cumulative deceased cases
    - a list containing a time series with the cumulative active cases
    - a list containing a time series with the daily increment
    in the confirmed cases
    - a list containing a time series with the daily increment
    in the deceased cases
    - a list containing a time series with the cumulative confirmed cases
    starting from the day of the first positive case
    - a list containing a time series with the cumulative confirmed cases per capita
    - a list containing a time series with the cumulative deceased cases per capita
    '''

    # Getting the name of thew Country in small letters
    country = Country.lower()
    
    # Extracting country-speficic data by using the function extract_country
    countryname_hiddendays = extract_country(Country, State, days)
    # Extracting the time series for the cumulative confirmed cases
    countryname_conf_hiddend = countryname_hiddendays[0]
    # Extracting the time series for the cumulative recovered cases
    countryname_recov_hiddend = countryname_hiddendays[1]
    # Extracting the time series for the cumulative deceased cases
    countryname_deceas_hiddend = countryname_hiddendays[2]
    # Calculating the active cases
    countryname_act_hiddend = list(np.array(countryname_conf_hiddend) - \
                                   np.array(countryname_recov_hiddend) - \
                                   np.array(countryname_deceas_hiddend))    
    # Extracting the time series for the daily increments in the confirmed cases
    countryname_conf_incr_hiddend = calc_increments(countryname_conf_hiddend)
    # Extracting the time series for the daily increments in the deceased cases
    countryname_deceas_incr_hiddend = calc_increments(countryname_deceas_hiddend)    
    # Extracting the complete time series about the cumulative confirmed cases
    complete_conf_series = extract_country(Country, State, 0)
    # Extracting the time series for the cumulative confirmed cases
    # starting from the day of the first positive case
    countryname_conf_pos = extract_non_null(complete_conf_series[0])
    # Extracting the time series for the cumulative confirmed cases per capita
    countryname_conf_hiddend_perc = pop_perc(countryname_conf_hiddend, pop)
    # Extracting the time series for the cumulative deceased cases per capita
    countryname_deceas_hiddend_perc = pop_perc(countryname_deceas_hiddend, pop)

    return countryname_conf_hiddend, \
           countryname_recov_hiddend, \
           countryname_deceas_hiddend, \
           countryname_act_hiddend, \
           countryname_conf_incr_hiddend, \
           countryname_deceas_incr_hiddend, \
           countryname_conf_pos, \
           countryname_conf_hiddend_perc, \
           countryname_deceas_hiddend_perc        
In [14]:
def find_error_days(listname):
    '''
    This function:
    takes a list,
    finds if the list contains negative increments by using the function find_neg_increm(listname),
    compares the position of such negative increments to the position of the days in the list days_tot and
    returns the corresponding days in a new list
    '''
    
    
    # Initializing a list to contain the positions in the list containing negative increments
    posit = []
    # Initializing a list to contain the days corresponding to negative increments in the list
    result = []
    # Checking for negative increments in the input list and storing their positions
    for position, item in enumerate(find_neg_increm(listname)):
        if item == 1:
            posit.append(position)
    # Finding the corresponding day 
    for position, item in enumerate(days_tot):
        if position in posit:
            result.append(item)
    print(result)

4. Dumping and Collecting the Data

The source csv files are located in the following directoryies:

  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_time_series
  • JHU_COVID-19/COVID-19/csse_covid_19_data/csse_covid_19_daily_reports

Those directory shall be located under the directory containing this notebook.

In [15]:
# Loading the data files into pandas dataframes
# Loading the world time series
world_confirmed = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                              'csse_covid_19_time_series/'
                              'time_series_covid19_confirmed_global.csv')
world_recovered = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_recovered_global.csv')
world_deceased = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                             'csse_covid_19_time_series/'
                             'time_series_covid19_deaths_global.csv')
In [16]:
# Uploading the latest daily report
last_day = find_last_day()  # calling the function last_day
daily_report = pd.read_csv('JHU_COVID-19/COVID-19/csse_covid_19_data/'
                           'csse_covid_19_daily_reports/' + last_day + '.csv')

File descriptions

  • time_series_covid19_confirmed_global.csv: confirmed cases for each day for each Country
  • time_series_covid19_recovered_global.csv: recovered cases for each day for each Country
  • time_series_covid19_deaths_global.csv: confirmed cases for each day for each Country
  • mm-dd-yyyy.csv: last available daily report
In [17]:
# Storing the total population for the Countries of interest (in millions)
# (source: Google)
italy_pop = 60.48
spain_pop = 46.66
germany_pop = 82.79
france_pop = 66.99
switzerland_pop = 8.57
netherlands_pop = 17.18
austria_pop = 8.822
belgium_pop = 11.4
portugal_pop = 10.29
luxembourg_pop = 0.602
poland_pop = 37.97
ireland_pop = 4.904
estonia_pop = 1.328
denmark_pop = 5.603
norway_pop = 5.368
sweden_pop = 10.12
iceland_pop = 0.364
finland_pop = 5.513
uk_pop = 66.44
us_pop = 327.2
hubei_pop = 58.5
china_pop = 1386
restchina_pop = china_pop-hubei_pop
brazil_pop = 212.559
russia_pop = 145.9
india_pop = 1380
In [18]:
# Storing the population density for the Countries of interest (people/km2)
# (source: Google)
italy_dens = 201.3
spain_dens = 91.4
germany_dens = 240
france_dens = 122.34
switzerland_dens = 219
netherlands_dens = 488
austria_dens = 109
belgium_dens = 383
portugal_dens = 111
luxembourg_dens = 242
poland_dens = 124
ireland_dens = 72
estonia_dens = 31
denmark_dens = 134
norway_dens = 15
sweden_dens = 25
iceland_dens = 3
finland_dens = 15
uk_dens = 274
us_dens = 36
hubei_dens = 310
china_dens = 145
brazil_dens = 25
russia_dens = 8.54
india_dens = 464
In [19]:
# Storing the median age for the Countries of interest
# source: https://en.wikipedia.org/wiki/List_of_countries_by_median_age
italy_median_age = 45.5
spain_median_age = 42.7
germany_median_age = 45.7
france_median_age = 41.4
switzerland_median_age = 42.4
netherlands_median_age = 42.6
austria_median_age = 44.0
belgium_median_age = 41.4
portugal_median_age = 42.2
luxembourg_median_age = 39.3
poland_median_age = 39.7
ireland_median_age = 36.5
estonia_median_age = 41.6
denmark_median_age = 42.2
norway_median_age = 39.2
sweden_median_age = 41.2
iceland_median_age = 36.5
finland_median_age = 42.5
uk_median_age = 40.5
us_median_age = 38.1
china_median_age = 37.4
brazil_median_age = 31.4
russia_median_age = 38.6
india_median_age = 26.8
In [20]:
# List of containment actions taken by the Finnish Government

# Creating a dataframe
measures = pd.DataFrame(columns=['Date', 'Actions'])

# Adding the actions
measures = measures.append(pd.Series(["12.3.",
"First containment measures: gathering of more than 500 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["16.3.",
"State of emergency declared: closing shools, universities, museums, theatres, \
libraries, sport facilities; gathering of more than 10 people banned"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["28.3.",
"Additional containment measures: Uusimaa region borders closed,  \
restaurant dining forbidden"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["11.4.",
"Additional containment measures: No passengers in ships from Germany, Sweden, Estonia"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.4.",
"First releasing measures: Uusima border re-opened"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["14.5.",
"More releasing misures: schools opening, business travell allowed within Schengen"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["1.6.",
"Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, \
reopening of museums and theatres"],
index=measures.columns), ignore_index=True)

measures = measures.append(pd.Series(["15.6.",
"End of state of emergency"],
index=measures.columns), ignore_index=True)

5. Data Analysis

5.1. Summary

Preliminary Data Analysis

The 3 time series files have columns for Province/State, Country/Region, latitude, longitude and data for each day. The columns related to the day are named in the format m/d/yy.

Each entry represents a different location. One Country can be associated with more than one State/Province and in this case one Country has more than one entry. This happens for US, China, Canada, France, Australia, United Kingdom, Netherlands and Denmark.

The daily report file has columns for Province/State, Country/Region, latitude, longitude and time stamp as well as cumulative confirmed, deaths and recovered cases.

Data Cleansing

NaN values have been handled by filling with the string "Not applicable".

Data Preparation

Separate datasets with no GPS coordinates and no time stamp have been created.

Separate datasets have been created to group data by Country.

A list of relevant dates for the plots has been created.

Country specific data has been extracted.

World-wide grand totals have been calculated.

A summary of the created datasets is available in section 5.5.

5.2. Preliminary Data Analysis

In [21]:
# Showing basic dataframe info
df_basic_data(world_confirmed)
Dataframe name: world_confirmed 

Dataframe length: 267 

Number of columns: 279 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[21]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 186
Country/Region object 189 0
Lat float64 263 0
Long float64 264 0
1/22/20 int64 11 0
1/23/20 int64 15 0
1/24/20 int64 19 0
1/25/20 int64 28 0
1/26/20 int64 29 0
1/27/20 int64 33 0
1/28/20 int64 36 0
1/29/20 int64 37 0
1/30/20 int64 40 0
1/31/20 int64 41 0
2/1/20 int64 44 0
2/2/20 int64 43 0
2/3/20 int64 43 0
2/4/20 int64 45 0
2/5/20 int64 46 0
2/6/20 int64 45 0
2/7/20 int64 44 0
2/8/20 int64 47 0
2/9/20 int64 47 0
2/10/20 int64 46 0
2/11/20 int64 48 0
2/12/20 int64 46 0
2/13/20 int64 48 0
2/14/20 int64 46 0
2/15/20 int64 46 0
2/16/20 int64 48 0
2/17/20 int64 49 0
2/18/20 int64 49 0
2/19/20 int64 50 0
2/20/20 int64 47 0
2/21/20 int64 49 0
2/22/20 int64 51 0
2/23/20 int64 50 0
2/24/20 int64 51 0
2/25/20 int64 55 0
2/26/20 int64 52 0
2/27/20 int64 57 0
2/28/20 int64 59 0
2/29/20 int64 60 0
3/1/20 int64 63 0
3/2/20 int64 65 0
3/3/20 int64 64 0
3/4/20 int64 71 0
3/5/20 int64 70 0
3/6/20 int64 77 0
3/7/20 int64 78 0
3/8/20 int64 85 0
3/9/20 int64 83 0
3/10/20 int64 89 0
3/11/20 int64 95 0
3/12/20 int64 100 0
3/13/20 int64 107 0
3/14/20 int64 111 0
3/15/20 int64 118 0
3/16/20 int64 123 0
3/17/20 int64 131 0
3/18/20 int64 135 0
3/19/20 int64 140 0
3/20/20 int64 144 0
3/21/20 int64 147 0
3/22/20 int64 161 0
3/23/20 int64 164 0
3/24/20 int64 168 0
3/25/20 int64 171 0
3/26/20 int64 176 0
3/27/20 int64 189 0
3/28/20 int64 186 0
3/29/20 int64 195 0
3/30/20 int64 184 0
3/31/20 int64 192 0
4/1/20 int64 193 0
4/2/20 int64 195 0
4/3/20 int64 200 0
4/4/20 int64 206 0
4/5/20 int64 200 0
4/6/20 int64 202 0
4/7/20 int64 205 0
4/8/20 int64 208 0
4/9/20 int64 216 0
4/10/20 int64 215 0
4/11/20 int64 209 0
4/12/20 int64 213 0
4/13/20 int64 221 0
4/14/20 int64 217 0
4/15/20 int64 217 0
4/16/20 int64 217 0
4/17/20 int64 216 0
4/18/20 int64 221 0
4/19/20 int64 223 0
4/20/20 int64 222 0
4/21/20 int64 224 0
4/22/20 int64 231 0
4/23/20 int64 231 0
4/24/20 int64 231 0
4/25/20 int64 232 0
4/26/20 int64 227 0
4/27/20 int64 228 0
4/28/20 int64 229 0
4/29/20 int64 232 0
4/30/20 int64 227 0
5/1/20 int64 232 0
5/2/20 int64 234 0
5/3/20 int64 231 0
5/4/20 int64 233 0
5/5/20 int64 234 0
5/6/20 int64 235 0
5/7/20 int64 235 0
5/8/20 int64 231 0
5/9/20 int64 236 0
5/10/20 int64 237 0
5/11/20 int64 234 0
5/12/20 int64 235 0
5/13/20 int64 235 0
5/14/20 int64 233 0
5/15/20 int64 236 0
5/16/20 int64 236 0
5/17/20 int64 240 0
5/18/20 int64 235 0
5/19/20 int64 234 0
5/20/20 int64 237 0
5/21/20 int64 233 0
5/22/20 int64 239 0
5/23/20 int64 242 0
5/24/20 int64 243 0
5/25/20 int64 240 0
5/26/20 int64 241 0
5/27/20 int64 242 0
5/28/20 int64 245 0
5/29/20 int64 242 0
5/30/20 int64 244 0
5/31/20 int64 244 0
6/1/20 int64 243 0
6/2/20 int64 247 0
6/3/20 int64 248 0
6/4/20 int64 247 0
6/5/20 int64 247 0
6/6/20 int64 243 0
6/7/20 int64 243 0
6/8/20 int64 246 0
6/9/20 int64 245 0
6/10/20 int64 251 0
6/11/20 int64 249 0
6/12/20 int64 248 0
6/13/20 int64 244 0
6/14/20 int64 246 0
6/15/20 int64 247 0
6/16/20 int64 243 0
6/17/20 int64 244 0
6/18/20 int64 249 0
6/19/20 int64 249 0
6/20/20 int64 249 0
6/21/20 int64 253 0
6/22/20 int64 250 0
6/23/20 int64 249 0
6/24/20 int64 247 0
6/25/20 int64 249 0
6/26/20 int64 249 0
6/27/20 int64 248 0
6/28/20 int64 249 0
6/29/20 int64 242 0
6/30/20 int64 247 0
7/1/20 int64 250 0
7/2/20 int64 252 0
7/3/20 int64 251 0
7/4/20 int64 252 0
7/5/20 int64 253 0
7/6/20 int64 253 0
7/7/20 int64 255 0
7/8/20 int64 254 0
7/9/20 int64 254 0
7/10/20 int64 256 0
7/11/20 int64 259 0
7/12/20 int64 260 0
7/13/20 int64 255 0
7/14/20 int64 254 0
7/15/20 int64 254 0
7/16/20 int64 246 0
7/17/20 int64 251 0
7/18/20 int64 251 0
7/19/20 int64 249 0
7/20/20 int64 250 0
7/21/20 int64 254 0
7/22/20 int64 253 0
7/23/20 int64 250 0
7/24/20 int64 255 0
7/25/20 int64 254 0
7/26/20 int64 254 0
7/27/20 int64 255 0
7/28/20 int64 255 0
7/29/20 int64 260 0
7/30/20 int64 258 0
7/31/20 int64 259 0
8/1/20 int64 258 0
8/2/20 int64 257 0
8/3/20 int64 254 0
8/4/20 int64 253 0
8/5/20 int64 252 0
8/6/20 int64 253 0
8/7/20 int64 255 0
8/8/20 int64 255 0
8/9/20 int64 255 0
8/10/20 int64 255 0
8/11/20 int64 254 0
8/12/20 int64 253 0
8/13/20 int64 252 0
8/14/20 int64 257 0
8/15/20 int64 253 0
8/16/20 int64 256 0
8/17/20 int64 258 0
8/18/20 int64 259 0
8/19/20 int64 258 0
8/20/20 int64 258 0
8/21/20 int64 256 0
8/22/20 int64 258 0
8/23/20 int64 258 0
8/24/20 int64 260 0
8/25/20 int64 255 0
8/26/20 int64 257 0
8/27/20 int64 258 0
8/28/20 int64 259 0
8/29/20 int64 259 0
8/30/20 int64 259 0
8/31/20 int64 256 0
9/1/20 int64 259 0
9/2/20 int64 257 0
9/3/20 int64 255 0
9/4/20 int64 255 0
9/5/20 int64 257 0
9/6/20 int64 258 0
9/7/20 int64 256 0
9/8/20 int64 258 0
9/9/20 int64 258 0
9/10/20 int64 258 0
9/11/20 int64 257 0
9/12/20 int64 257 0
9/13/20 int64 256 0
9/14/20 int64 256 0
9/15/20 int64 258 0
9/16/20 int64 259 0
9/17/20 int64 258 0
9/18/20 int64 260 0
9/19/20 int64 258 0
9/20/20 int64 257 0
9/21/20 int64 255 0
9/22/20 int64 257 0
9/23/20 int64 256 0
9/24/20 int64 258 0
9/25/20 int64 259 0
9/26/20 int64 258 0
9/27/20 int64 258 0
9/28/20 int64 262 0
9/29/20 int64 263 0
9/30/20 int64 261 0
10/1/20 int64 261 0
10/2/20 int64 261 0
10/3/20 int64 260 0
10/4/20 int64 261 0
10/5/20 int64 261 0
10/6/20 int64 262 0
10/7/20 int64 260 0
10/8/20 int64 260 0
10/9/20 int64 259 0
10/10/20 int64 260 0
10/11/20 int64 260 0
10/12/20 int64 255 0
10/13/20 int64 258 0
10/14/20 int64 260 0
10/15/20 int64 260 0
10/16/20 int64 258 0
10/17/20 int64 255 0
10/18/20 int64 256 0
10/19/20 int64 259 0
10/20/20 int64 255 0
10/21/20 int64 256 0
10/22/20 int64 257 0
In [22]:
# Showing basic dataframe info
df_basic_data(world_recovered)
Dataframe name: world_recovered 

Dataframe length: 254 

Number of columns: 279 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[22]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 67 187
Country/Region object 189 0
Lat float64 253 0
Long float64 253 0
1/22/20 int64 2 0
1/23/20 int64 3 0
1/24/20 int64 4 0
1/25/20 int64 4 0
1/26/20 int64 4 0
1/27/20 int64 6 0
1/28/20 int64 7 0
1/29/20 int64 7 0
1/30/20 int64 7 0
1/31/20 int64 10 0
2/1/20 int64 12 0
2/2/20 int64 16 0
2/3/20 int64 18 0
2/4/20 int64 18 0
2/5/20 int64 20 0
2/6/20 int64 24 0
2/7/20 int64 28 0
2/8/20 int64 31 0
2/9/20 int64 29 0
2/10/20 int64 28 0
2/11/20 int64 32 0
2/12/20 int64 34 0
2/13/20 int64 35 0
2/14/20 int64 37 0
2/15/20 int64 38 0
2/16/20 int64 40 0
2/17/20 int64 39 0
2/18/20 int64 39 0
2/19/20 int64 42 0
2/20/20 int64 40 0
2/21/20 int64 43 0
2/22/20 int64 43 0
2/23/20 int64 43 0
2/24/20 int64 42 0
2/25/20 int64 45 0
2/26/20 int64 45 0
2/27/20 int64 45 0
2/28/20 int64 46 0
2/29/20 int64 50 0
3/1/20 int64 50 0
3/2/20 int64 49 0
3/3/20 int64 50 0
3/4/20 int64 49 0
3/5/20 int64 50 0
3/6/20 int64 50 0
3/7/20 int64 50 0
3/8/20 int64 51 0
3/9/20 int64 51 0
3/10/20 int64 52 0
3/11/20 int64 56 0
3/12/20 int64 58 0
3/13/20 int64 59 0
3/14/20 int64 56 0
3/15/20 int64 56 0
3/16/20 int64 55 0
3/17/20 int64 58 0
3/18/20 int64 61 0
3/19/20 int64 62 0
3/20/20 int64 66 0
3/21/20 int64 72 0
3/22/20 int64 71 0
3/23/20 int64 71 0
3/24/20 int64 78 0
3/25/20 int64 79 0
3/26/20 int64 87 0
3/27/20 int64 90 0
3/28/20 int64 97 0
3/29/20 int64 95 0
3/30/20 int64 105 0
3/31/20 int64 105 0
4/1/20 int64 113 0
4/2/20 int64 118 0
4/3/20 int64 125 0
4/4/20 int64 123 0
4/5/20 int64 126 0
4/6/20 int64 135 0
4/7/20 int64 141 0
4/8/20 int64 139 0
4/9/20 int64 142 0
4/10/20 int64 148 0
4/11/20 int64 150 0
4/12/20 int64 156 0
4/13/20 int64 154 0
4/14/20 int64 163 0
4/15/20 int64 158 0
4/16/20 int64 165 0
4/17/20 int64 172 0
4/18/20 int64 165 0
4/19/20 int64 175 0
4/20/20 int64 173 0
4/21/20 int64 176 0
4/22/20 int64 185 0
4/23/20 int64 187 0
4/24/20 int64 191 0
4/25/20 int64 189 0
4/26/20 int64 193 0
4/27/20 int64 191 0
4/28/20 int64 188 0
4/29/20 int64 195 0
4/30/20 int64 199 0
5/1/20 int64 199 0
5/2/20 int64 202 0
5/3/20 int64 201 0
5/4/20 int64 202 0
5/5/20 int64 204 0
5/6/20 int64 203 0
5/7/20 int64 201 0
5/8/20 int64 208 0
5/9/20 int64 208 0
5/10/20 int64 207 0
5/11/20 int64 209 0
5/12/20 int64 206 0
5/13/20 int64 209 0
5/14/20 int64 210 0
5/15/20 int64 211 0
5/16/20 int64 210 0
5/17/20 int64 213 0
5/18/20 int64 214 0
5/19/20 int64 214 0
5/20/20 int64 219 0
5/21/20 int64 212 0
5/22/20 int64 212 0
5/23/20 int64 211 0
5/24/20 int64 214 0
5/25/20 int64 216 0
5/26/20 int64 215 0
5/27/20 int64 217 0
5/28/20 int64 216 0
5/29/20 int64 217 0
5/30/20 int64 216 0
5/31/20 int64 218 0
6/1/20 int64 216 0
6/2/20 int64 217 0
6/3/20 int64 218 0
6/4/20 int64 214 0
6/5/20 int64 218 0
6/6/20 int64 222 0
6/7/20 int64 224 0
6/8/20 int64 226 0
6/9/20 int64 226 0
6/10/20 int64 225 0
6/11/20 int64 228 0
6/12/20 int64 228 0
6/13/20 int64 228 0
6/14/20 int64 230 0
6/15/20 int64 229 0
6/16/20 int64 228 0
6/17/20 int64 227 0
6/18/20 int64 226 0
6/19/20 int64 231 0
6/20/20 int64 230 0
6/21/20 int64 231 0
6/22/20 int64 233 0
6/23/20 int64 232 0
6/24/20 int64 233 0
6/25/20 int64 232 0
6/26/20 int64 231 0
6/27/20 int64 233 0
6/28/20 int64 231 0
6/29/20 int64 235 0
6/30/20 int64 231 0
7/1/20 int64 230 0
7/2/20 int64 228 0
7/3/20 int64 227 0
7/4/20 int64 230 0
7/5/20 int64 231 0
7/6/20 int64 227 0
7/7/20 int64 231 0
7/8/20 int64 230 0
7/9/20 int64 233 0
7/10/20 int64 236 0
7/11/20 int64 236 0
7/12/20 int64 237 0
7/13/20 int64 237 0
7/14/20 int64 231 0
7/15/20 int64 235 0
7/16/20 int64 233 0
7/17/20 int64 236 0
7/18/20 int64 231 0
7/19/20 int64 233 0
7/20/20 int64 230 0
7/21/20 int64 229 0
7/22/20 int64 233 0
7/23/20 int64 237 0
7/24/20 int64 234 0
7/25/20 int64 236 0
7/26/20 int64 234 0
7/27/20 int64 236 0
7/28/20 int64 235 0
7/29/20 int64 234 0
7/30/20 int64 234 0
7/31/20 int64 235 0
8/1/20 int64 236 0
8/2/20 int64 234 0
8/3/20 int64 235 0
8/4/20 int64 235 0
8/5/20 int64 237 0
8/6/20 int64 240 0
8/7/20 int64 240 0
8/8/20 int64 239 0
8/9/20 int64 237 0
8/10/20 int64 233 0
8/11/20 int64 240 0
8/12/20 int64 234 0
8/13/20 int64 231 0
8/14/20 int64 233 0
8/15/20 int64 239 0
8/16/20 int64 237 0
8/17/20 int64 236 0
8/18/20 int64 236 0
8/19/20 int64 234 0
8/20/20 int64 236 0
8/21/20 int64 237 0
8/22/20 int64 237 0
8/23/20 int64 235 0
8/24/20 int64 235 0
8/25/20 int64 240 0
8/26/20 int64 238 0
8/27/20 int64 236 0
8/28/20 int64 238 0
8/29/20 int64 240 0
8/30/20 int64 239 0
8/31/20 int64 238 0
9/1/20 int64 241 0
9/2/20 int64 238 0
9/3/20 int64 240 0
9/4/20 int64 242 0
9/5/20 int64 243 0
9/6/20 int64 243 0
9/7/20 int64 241 0
9/8/20 int64 239 0
9/9/20 int64 241 0
9/10/20 int64 241 0
9/11/20 int64 242 0
9/12/20 int64 244 0
9/13/20 int64 243 0
9/14/20 int64 240 0
9/15/20 int64 238 0
9/16/20 int64 238 0
9/17/20 int64 241 0
9/18/20 int64 242 0
9/19/20 int64 242 0
9/20/20 int64 242 0
9/21/20 int64 241 0
9/22/20 int64 240 0
9/23/20 int64 241 0
9/24/20 int64 244 0
9/25/20 int64 242 0
9/26/20 int64 243 0
9/27/20 int64 243 0
9/28/20 int64 242 0
9/29/20 int64 243 0
9/30/20 int64 243 0
10/1/20 int64 244 0
10/2/20 int64 241 0
10/3/20 int64 239 0
10/4/20 int64 242 0
10/5/20 int64 241 0
10/6/20 int64 241 0
10/7/20 int64 240 0
10/8/20 int64 241 0
10/9/20 int64 241 0
10/10/20 int64 239 0
10/11/20 int64 241 0
10/12/20 int64 242 0
10/13/20 int64 244 0
10/14/20 int64 243 0
10/15/20 int64 242 0
10/16/20 int64 241 0
10/17/20 int64 245 0
10/18/20 int64 243 0
10/19/20 int64 244 0
10/20/20 int64 243 0
10/21/20 int64 243 0
10/22/20 int64 243 0
In [23]:
# Showing basic dataframe info
df_basic_data(world_deceased)
Dataframe name: world_deceased 

Dataframe length: 267 

Number of columns: 279 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[23]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
Province/State object 81 186
Country/Region object 189 0
Lat float64 263 0
Long float64 264 0
1/22/20 int64 2 0
1/23/20 int64 3 0
1/24/20 int64 3 0
1/25/20 int64 3 0
1/26/20 int64 3 0
1/27/20 int64 3 0
1/28/20 int64 3 0
1/29/20 int64 4 0
1/30/20 int64 4 0
1/31/20 int64 4 0
2/1/20 int64 4 0
2/2/20 int64 4 0
2/3/20 int64 4 0
2/4/20 int64 4 0
2/5/20 int64 4 0
2/6/20 int64 5 0
2/7/20 int64 5 0
2/8/20 int64 6 0
2/9/20 int64 6 0
2/10/20 int64 7 0
2/11/20 int64 8 0
2/12/20 int64 7 0
2/13/20 int64 9 0
2/14/20 int64 9 0
2/15/20 int64 10 0
2/16/20 int64 10 0
2/17/20 int64 10 0
2/18/20 int64 10 0
2/19/20 int64 10 0
2/20/20 int64 10 0
2/21/20 int64 10 0
2/22/20 int64 10 0
2/23/20 int64 11 0
2/24/20 int64 12 0
2/25/20 int64 13 0
2/26/20 int64 11 0
2/27/20 int64 13 0
2/28/20 int64 13 0
2/29/20 int64 15 0
3/1/20 int64 15 0
3/2/20 int64 15 0
3/3/20 int64 15 0
3/4/20 int64 16 0
3/5/20 int64 15 0
3/6/20 int64 17 0
3/7/20 int64 17 0
3/8/20 int64 17 0
3/9/20 int64 16 0
3/10/20 int64 17 0
3/11/20 int64 19 0
3/12/20 int64 20 0
3/13/20 int64 22 0
3/14/20 int64 22 0
3/15/20 int64 24 0
3/16/20 int64 25 0
3/17/20 int64 27 0
3/18/20 int64 27 0
3/19/20 int64 29 0
3/20/20 int64 31 0
3/21/20 int64 33 0
3/22/20 int64 34 0
3/23/20 int64 38 0
3/24/20 int64 40 0
3/25/20 int64 40 0
3/26/20 int64 46 0
3/27/20 int64 48 0
3/28/20 int64 50 0
3/29/20 int64 54 0
3/30/20 int64 55 0
3/31/20 int64 60 0
4/1/20 int64 61 0
4/2/20 int64 60 0
4/3/20 int64 68 0
4/4/20 int64 67 0
4/5/20 int64 69 0
4/6/20 int64 74 0
4/7/20 int64 75 0
4/8/20 int64 78 0
4/9/20 int64 79 0
4/10/20 int64 83 0
4/11/20 int64 82 0
4/12/20 int64 82 0
4/13/20 int64 84 0
4/14/20 int64 88 0
4/15/20 int64 87 0
4/16/20 int64 86 0
4/17/20 int64 93 0
4/18/20 int64 94 0
4/19/20 int64 92 0
4/20/20 int64 92 0
4/21/20 int64 91 0
4/22/20 int64 96 0
4/23/20 int64 97 0
4/24/20 int64 99 0
4/25/20 int64 95 0
4/26/20 int64 99 0
4/27/20 int64 101 0
4/28/20 int64 100 0
4/29/20 int64 103 0
4/30/20 int64 102 0
5/1/20 int64 102 0
5/2/20 int64 100 0
5/3/20 int64 105 0
5/4/20 int64 109 0
5/5/20 int64 106 0
5/6/20 int64 109 0
5/7/20 int64 110 0
5/8/20 int64 108 0
5/9/20 int64 104 0
5/10/20 int64 110 0
5/11/20 int64 107 0
5/12/20 int64 113 0
5/13/20 int64 113 0
5/14/20 int64 112 0
5/15/20 int64 118 0
5/16/20 int64 114 0
5/17/20 int64 119 0
5/18/20 int64 117 0
5/19/20 int64 116 0
5/20/20 int64 117 0
5/21/20 int64 114 0
5/22/20 int64 122 0
5/23/20 int64 122 0
5/24/20 int64 116 0
5/25/20 int64 118 0
5/26/20 int64 121 0
5/27/20 int64 123 0
5/28/20 int64 124 0
5/29/20 int64 127 0
5/30/20 int64 125 0
5/31/20 int64 122 0
6/1/20 int64 124 0
6/2/20 int64 125 0
6/3/20 int64 125 0
6/4/20 int64 125 0
6/5/20 int64 126 0
6/6/20 int64 123 0
6/7/20 int64 125 0
6/8/20 int64 127 0
6/9/20 int64 126 0
6/10/20 int64 127 0
6/11/20 int64 128 0
6/12/20 int64 128 0
6/13/20 int64 129 0
6/14/20 int64 128 0
6/15/20 int64 130 0
6/16/20 int64 129 0
6/17/20 int64 132 0
6/18/20 int64 130 0
6/19/20 int64 134 0
6/20/20 int64 130 0
6/21/20 int64 134 0
6/22/20 int64 138 0
6/23/20 int64 137 0
6/24/20 int64 135 0
6/25/20 int64 138 0
6/26/20 int64 139 0
6/27/20 int64 136 0
6/28/20 int64 136 0
6/29/20 int64 138 0
6/30/20 int64 141 0
7/1/20 int64 141 0
7/2/20 int64 137 0
7/3/20 int64 141 0
7/4/20 int64 140 0
7/5/20 int64 141 0
7/6/20 int64 142 0
7/7/20 int64 145 0
7/8/20 int64 143 0
7/9/20 int64 141 0
7/10/20 int64 143 0
7/11/20 int64 143 0
7/12/20 int64 140 0
7/13/20 int64 143 0
7/14/20 int64 139 0
7/15/20 int64 144 0
7/16/20 int64 137 0
7/17/20 int64 140 0
7/18/20 int64 149 0
7/19/20 int64 144 0
7/20/20 int64 145 0
7/21/20 int64 149 0
7/22/20 int64 146 0
7/23/20 int64 144 0
7/24/20 int64 147 0
7/25/20 int64 150 0
7/26/20 int64 151 0
7/27/20 int64 157 0
7/28/20 int64 154 0
7/29/20 int64 153 0
7/30/20 int64 156 0
7/31/20 int64 153 0
8/1/20 int64 150 0
8/2/20 int64 156 0
8/3/20 int64 152 0
8/4/20 int64 156 0
8/5/20 int64 153 0
8/6/20 int64 157 0
8/7/20 int64 152 0
8/8/20 int64 155 0
8/9/20 int64 153 0
8/10/20 int64 156 0
8/11/20 int64 153 0
8/12/20 int64 156 0
8/13/20 int64 161 0
8/14/20 int64 156 0
8/15/20 int64 153 0
8/16/20 int64 153 0
8/17/20 int64 158 0
8/18/20 int64 161 0
8/19/20 int64 161 0
8/20/20 int64 156 0
8/21/20 int64 158 0
8/22/20 int64 157 0
8/23/20 int64 159 0
8/24/20 int64 158 0
8/25/20 int64 163 0
8/26/20 int64 167 0
8/27/20 int64 165 0
8/28/20 int64 167 0
8/29/20 int64 168 0
8/30/20 int64 169 0
8/31/20 int64 168 0
9/1/20 int64 167 0
9/2/20 int64 168 0
9/3/20 int64 163 0
9/4/20 int64 166 0
9/5/20 int64 168 0
9/6/20 int64 169 0
9/7/20 int64 164 0
9/8/20 int64 169 0
9/9/20 int64 164 0
9/10/20 int64 162 0
9/11/20 int64 164 0
9/12/20 int64 166 0
9/13/20 int64 164 0
9/14/20 int64 163 0
9/15/20 int64 165 0
9/16/20 int64 167 0
9/17/20 int64 167 0
9/18/20 int64 169 0
9/19/20 int64 168 0
9/20/20 int64 169 0
9/21/20 int64 171 0
9/22/20 int64 167 0
9/23/20 int64 164 0
9/24/20 int64 167 0
9/25/20 int64 170 0
9/26/20 int64 170 0
9/27/20 int64 172 0
9/28/20 int64 168 0
9/29/20 int64 170 0
9/30/20 int64 169 0
10/1/20 int64 166 0
10/2/20 int64 168 0
10/3/20 int64 168 0
10/4/20 int64 167 0
10/5/20 int64 170 0
10/6/20 int64 168 0
10/7/20 int64 168 0
10/8/20 int64 170 0
10/9/20 int64 171 0
10/10/20 int64 172 0
10/11/20 int64 169 0
10/12/20 int64 174 0
10/13/20 int64 175 0
10/14/20 int64 169 0
10/15/20 int64 171 0
10/16/20 int64 169 0
10/17/20 int64 172 0
10/18/20 int64 174 0
10/19/20 int64 173 0
10/20/20 int64 174 0
10/21/20 int64 172 0
10/22/20 int64 176 0
In [24]:
# Showing basic dataframe info
df_basic_data(daily_report)
Dataframe name: daily_report 

Dataframe length: 3958 

Number of columns: 14 

Dataframe's columns names, column data types, amount of distint (non null) values
and amount of null values for each column:
Out[24]:
Data_Type Amount_of_Distint_Values Amount_of_Null_Values
FIPS float64 3261 697
Admin2 object 1921 692
Province_State object 562 169
Country_Region object 189 0
Last_Update object 3 0
Lat float64 3876 81
Long_ float64 3866 81
Confirmed int64 2351 0
Deaths int64 628 0
Recovered int64 594 0
Active float64 2196 4
Combined_Key object 3958 0
Incidence_Rate float64 3871 81
Case-Fatality_Ratio float64 2885 44
In [25]:
# Checking how data looks like
print("world_confirmed")
world_confirmed.head()
world_confirmed
Out[25]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20 7/22/20 7/23/20 7/24/20 7/25/20 7/26/20 7/27/20 7/28/20 7/29/20 7/30/20 7/31/20 8/1/20 8/2/20 8/3/20 8/4/20 8/5/20 8/6/20 8/7/20 8/8/20 8/9/20 8/10/20 8/11/20 8/12/20 8/13/20 8/14/20 8/15/20 8/16/20 8/17/20 8/18/20 8/19/20 8/20/20 8/21/20 8/22/20 8/23/20 8/24/20 8/25/20 8/26/20 8/27/20 8/28/20 8/29/20 8/30/20 8/31/20 9/1/20 9/2/20 9/3/20 9/4/20 9/5/20 9/6/20 9/7/20 9/8/20 9/9/20 9/10/20 9/11/20 9/12/20 9/13/20 9/14/20 9/15/20 9/16/20 9/17/20 9/18/20 9/19/20 9/20/20 9/21/20 9/22/20 9/23/20 9/24/20 9/25/20 9/26/20 9/27/20 9/28/20 9/29/20 9/30/20 10/1/20 10/2/20 10/3/20 10/4/20 10/5/20 10/6/20 10/7/20 10/8/20 10/9/20 10/10/20 10/11/20 10/12/20 10/13/20 10/14/20 10/15/20 10/16/20 10/17/20 10/18/20 10/19/20 10/20/20 10/21/20 10/22/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 5 7 7 7 11 16 21 22 22 22 24 24 40 40 74 84 94 110 110 120 170 174 237 273 281 299 349 367 423 444 484 521 555 607 665 714 784 840 906 933 996 1026 1092 1176 1279 1351 1463 1531 1703 1828 1939 2171 2335 2469 2704 2894 3224 3392 3563 3778 4033 4402 4687 4963 5226 5639 6053 6402 6664 7072 7653 8145 8676 9216 9998 10582 11173 11831 12456 13036 13659 14525 15205 15750 16509 17267 18054 18969 19551 20342 20917 21459 22142 22890 23546 24102 24766 25527 26310 26874 27532 27878 28424 28833 29157 29481 29640 30175 30451 30616 30967 31238 31517 31836 32022 32324 32672 32951 33190 33384 33594 33908 34194 34366 34451 34455 34740 34994 35070 35229 35301 35475 35526 35615 35727 35928 35981 36036 36157 36263 36368 36471 36542 36675 36710 36710 36747 36782 36829 36896 37015 37054 37054 37162 37269 37345 37424 37431 37551 37596 37599 37599 37599 37856 37894 37953 37999 38054 38070 38113 38129 38140 38143 38162 38165 38196 38243 38288 38304 38324 38398 38494 38520 38544 38572 38606 38641 38716 38772 38815 38855 38872 38883 38919 39044 39074 39096 39145 39170 39186 39192 39227 39233 39254 39268 39285 39290 39297 39341 39422 39486 39548 39616 39693 39703 39799 39870 39928 39994 40026 40073 40141 40200 40287 40357 40510 40626
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 10 12 23 33 38 42 51 55 59 64 70 76 89 104 123 146 174 186 197 212 223 243 259 277 304 333 361 377 383 400 409 416 433 446 467 475 494 518 539 548 562 584 609 634 663 678 712 726 736 750 766 773 782 789 795 803 820 832 842 850 856 868 872 876 880 898 916 933 946 948 949 964 969 981 989 998 1004 1029 1050 1076 1099 1122 1137 1143 1164 1184 1197 1212 1232 1246 1263 1299 1341 1385 1416 1464 1521 1590 1672 1722 1788 1838 1891 1962 1995 2047 2114 2192 2269 2330 2402 2466 2535 2580 2662 2752 2819 2893 2964 3038 3106 3188 3278 3371 3454 3571 3667 3752 3851 3906 4008 4090 4171 4290 4358 4466 4570 4637 4763 4880 4997 5105 5197 5276 5396 5519 5620 5750 5889 6016 6151 6275 6411 6536 6676 6817 6971 7117 7260 7380 7499 7654 7812 7967 8119 8275 8427 8605 8759 8927 9083 9195 9279 9380 9513 9606 9728 9844 9967 10102 10255 10406 10553 10704 10860 11021 11185 11353 11520 11672 11816 11948 12073 12226 12385 12535 12666 12787 12921 13045 13153 13259 13391 13518 13649 13806 13965 14117 14266 14410 14568 14730 14899 15066 15231 15399 15570 15752 15955 16212 16501 16774 17055 17350 17651 17948 18250
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 3 5 12 12 17 17 19 20 20 20 24 26 37 48 54 60 74 87 90 139 201 230 264 302 367 409 454 511 584 716 847 986 1171 1251 1320 1423 1468 1572 1666 1761 1825 1914 1983 2070 2160 2268 2418 2534 2629 2718 2811 2910 3007 3127 3256 3382 3517 3649 3848 4006 4154 4295 4474 4648 4838 4997 5182 5369 5558 5723 5891 6067 6253 6442 6629 6821 7019 7201 7377 7542 7728 7918 8113 8306 8503 8697 8857 8997 9134 9267 9394 9513 9626 9733 9831 9935 10050 10154 10265 10382 10484 10589 10698 10810 10919 11031 11147 11268 11385 11504 11631 11771 11920 12076 12248 12445 12685 12968 13273 13571 13907 14272 14657 15070 15500 15941 16404 16879 17348 17808 18242 18712 19195 19689 20216 20770 21355 21948 22549 23084 23691 24278 24872 25484 26159 26764 27357 27973 28615 29229 29831 30394 30950 31465 31972 32504 33055 33626 34155 34693 35160 35712 36204 36699 37187 37664 38133 38583 39025 39444 39847 40258 40667 41068 41460 41858 42228 42619 43016 43403 43781 44146 44494 44833 45158 45469 45773 46071 46364 46653 46938 47216 47488 47752 48007 48254 48496 48734 48966 49194 49413 49623 49826 50023 50214 50400 50579 50754 50914 51067 51213 51368 51530 51690 51847 51995 52136 52270 52399 52520 52658 52804 52940 53072 53325 53399 53584 53777 53998 54203 54402 54616 54829 55081 55357
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 39 39 53 75 88 113 133 164 188 224 267 308 334 370 376 390 428 439 466 501 525 545 564 583 601 601 638 646 659 673 673 696 704 713 717 717 723 723 731 738 738 743 743 743 745 745 747 748 750 751 751 752 752 754 755 755 758 760 761 761 761 761 761 761 762 762 762 762 762 763 763 763 763 764 764 764 765 844 851 852 852 852 852 852 852 852 852 853 853 853 853 854 854 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 855 858 861 862 877 880 880 880 884 884 889 889 897 897 897 907 907 918 922 925 925 925 937 939 939 944 955 955 955 963 963 977 981 989 989 989 1005 1005 1024 1024 1045 1045 1045 1060 1060 1098 1098 1124 1124 1124 1176 1184 1199 1199 1215 1215 1215 1261 1261 1301 1301 1344 1344 1344 1438 1438 1483 1483 1564 1564 1564 1681 1681 1753 1753 1836 1836 1836 1966 1966 2050 2050 2110 2110 2110 2370 2370 2568 2568 2696 2696 2696 2995 2995 3190 3190 3377 3377 3377 3623 3623 3811 3811
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 3 3 3 4 4 5 7 7 7 8 8 8 10 14 16 17 19 19 19 19 19 19 19 19 19 19 24 24 24 24 25 25 25 25 26 27 27 27 27 30 35 35 35 36 36 36 43 43 45 45 45 45 48 48 48 48 50 52 52 58 60 61 69 70 70 71 74 81 84 86 86 86 86 86 86 88 91 92 96 113 118 130 138 140 142 148 155 166 172 176 183 186 189 197 212 212 259 267 276 284 291 315 328 346 346 346 386 386 396 458 462 506 525 541 576 607 638 687 705 749 779 812 851 880 916 932 950 1000 1078 1109 1148 1164 1199 1280 1344 1395 1483 1538 1572 1672 1679 1735 1762 1815 1852 1879 1906 1935 1966 2015 2044 2068 2134 2171 2222 2283 2332 2415 2471 2551 2624 2654 2729 2777 2805 2876 2935 2965 2981 3033 3092 3217 3279 3335 3388 3439 3569 3675 3789 3848 3901 3991 4117 4236 4363 4475 4590 4672 4718 4797 4905 4972 5114 5211 5370 5402 5530 5725 5725 5958 6031 6246 6366 6488 6680 6846 7096 7222 7462 7622 7829 8049 8338 8582
In [26]:
# Checking how data looks like
print("world_recovered")
world_recovered.head()
world_recovered
Out[26]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20 7/22/20 7/23/20 7/24/20 7/25/20 7/26/20 7/27/20 7/28/20 7/29/20 7/30/20 7/31/20 8/1/20 8/2/20 8/3/20 8/4/20 8/5/20 8/6/20 8/7/20 8/8/20 8/9/20 8/10/20 8/11/20 8/12/20 8/13/20 8/14/20 8/15/20 8/16/20 8/17/20 8/18/20 8/19/20 8/20/20 8/21/20 8/22/20 8/23/20 8/24/20 8/25/20 8/26/20 8/27/20 8/28/20 8/29/20 8/30/20 8/31/20 9/1/20 9/2/20 9/3/20 9/4/20 9/5/20 9/6/20 9/7/20 9/8/20 9/9/20 9/10/20 9/11/20 9/12/20 9/13/20 9/14/20 9/15/20 9/16/20 9/17/20 9/18/20 9/19/20 9/20/20 9/21/20 9/22/20 9/23/20 9/24/20 9/25/20 9/26/20 9/27/20 9/28/20 9/29/20 9/30/20 10/1/20 10/2/20 10/3/20 10/4/20 10/5/20 10/6/20 10/7/20 10/8/20 10/9/20 10/10/20 10/11/20 10/12/20 10/13/20 10/14/20 10/15/20 10/16/20 10/17/20 10/18/20 10/19/20 10/20/20 10/21/20 10/22/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 5 5 10 10 10 15 18 18 29 32 32 32 32 32 40 43 54 99 112 131 135 150 166 179 188 188 207 220 228 252 260 310 331 345 397 421 458 468 472 502 558 558 610 648 691 745 745 778 801 850 930 938 996 1040 1075 1097 1128 1138 1209 1259 1303 1328 1428 1450 1522 1585 1762 1830 1875 2171 2651 3013 3326 3928 4201 4725 5164 5508 6158 7660 7962 8292 8764 8841 9260 9869 10174 10306 10674 12604 13934 14131 15651 16041 17331 19164 19366 20103 20179 20700 20847 20882 21135 21216 21254 21454 22456 22824 23151 23273 23634 23741 23741 23924 24550 24602 24793 25180 25198 25358 25389 25471 25509 25509 25510 25669 25669 25742 25840 25903 25960 25960 26228 26415 26694 26714 26714 27166 27166 27166 27166 27166 27681 28016 28016 28180 28360 28440 29042 29046 29059 29063 29089 29089 29231 29315 29390 29713 30082 30537 30557 30715 31048 31129 31154 31234 31638 32073 32098 32503 32505 32576 32576 32576 32576 32576 32610 32619 32619 32635 32642 32642 32746 32789 32842 32842 32842 32852 32879 32977 33045 33058 33058 33064 33114 33118 33308 33354 33447 33516 33561 33614 33760 33790 33824 33831
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 10 17 17 31 31 33 44 52 67 76 89 99 104 116 131 154 165 182 197 217 232 248 251 277 283 302 314 327 345 356 385 394 403 410 422 431 455 470 488 519 531 543 570 595 605 620 627 650 654 682 688 694 705 714 715 727 742 758 771 777 783 789 795 803 812 823 851 857 872 877 891 898 898 910 925 938 945 960 980 1001 1034 1039 1044 1055 1064 1077 1086 1114 1126 1134 1159 1195 1217 1250 1298 1346 1384 1438 1459 1516 1559 1592 1637 1657 1702 1744 1791 1832 1875 1881 1946 2014 2062 2091 2137 2214 2264 2311 2352 2397 2463 2523 2608 2637 2682 2745 2789 2830 2883 2952 2961 3018 3031 3031 3123 3155 3227 3268 3342 3379 3480 3552 3616 3695 3746 3794 3816 3871 3928 3986 4096 4184 4332 4413 4530 4633 4791 4923 5020 5139 5214 5441 5582 5732 5882 5976 6106 6186 6239 6284 6346 6443 6494 6569 6615 6668 6733 6788 6831 6888 6940 6995 7042 7139 7239 7309 7397 7397 7629 7732 7847 8077 8342 8536 8675 8825 8965 9115 9215 9304 9406 9500 9585 9675 9762 9864 9957 10001 10071 10167 10225 10341 10395
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 8 12 12 12 12 12 32 32 32 65 65 24 65 29 29 31 31 37 46 61 61 62 90 90 90 113 237 347 405 460 591 601 691 708 783 846 894 1047 1099 1152 1204 1355 1408 1479 1508 1558 1651 1702 1779 1821 1872 1936 1998 2067 2197 2323 2467 2546 2678 2841 2998 3058 3158 3271 3409 3507 3625 3746 3968 4062 4256 4426 4784 4747 4918 5129 5277 5422 5549 5748 5894 6067 6218 6297 6453 6631 6717 6799 6951 7074 7255 7322 7420 7606 7735 7842 7943 8078 8196 8324 8422 8559 8674 8792 8920 9066 9202 9371 9674 9897 10040 10342 10832 11181 11492 11884 12094 12329 12637 13124 13124 13743 14019 14295 14792 15107 15430 15744 16051 16400 16646 16983 17369 17369 18076 18088 18837 19233 19592 20082 20537 20988 21419 21901 22375 22802 23238 23667 24083 24506 24920 25263 25627 26004 26308 26644 27017 27347 27653 27971 28281 28587 28874 29142 29369 29587 29886 30157 30436 30717 30978 31244 31493 31746 32006 32259 32481 32745 32985 33183 33379 33562 33723 33875 34037 34204 34385 34517 34675 34818 34923 35047 35180 35307 35428 35544 35654 35756 35860 35962 36063 36174 36282 36385 36482 36578 36672 36763 36857 36958 37067 37170 37170 37382 37492 37603 37603 37856 37971 38088 38215 38346 38482 38618
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 10 10 10 16 21 26 31 39 52 58 71 71 128 128 128 169 169 191 205 235 248 282 309 333 344 344 344 385 398 423 468 468 472 493 499 514 521 526 537 545 550 550 568 576 596 604 615 617 624 628 639 639 652 653 653 663 676 676 681 684 692 694 698 733 735 738 741 741 744 751 757 759 780 781 781 781 789 789 791 792 792 792 792 796 797 797 797 799 799 799 799 799 799 800 800 800 800 800 800 802 802 803 803 803 803 803 803 803 803 803 803 803 803 803 803 803 803 803 803 803 804 806 807 807 807 821 825 825 828 839 839 839 839 839 855 858 863 863 863 869 869 875 875 875 875 875 877 877 893 893 902 902 902 908 908 909 909 928 928 928 934 934 938 938 943 943 943 945 945 1054 1054 1164 1164 1164 1199 1199 1203 1203 1263 1263 1263 1265 1265 1432 1432 1540 1540 1540 1615 1615 1715 1715 1814 1814 1814 1928 1928 2011 2011 2057 2057 2057 2273 2273 2470 2470
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 2 2 4 4 4 5 5 5 5 6 6 6 6 6 6 6 6 6 6 6 7 7 11 11 11 11 11 11 11 11 13 13 13 13 14 14 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18 18 21 24 24 38 38 40 41 42 61 61 64 64 64 64 66 66 77 77 77 77 81 81 81 81 93 93 97 97 107 108 108 108 117 117 117 117 118 118 118 118 124 124 199 210 221 221 221 221 236 241 242 242 242 266 301 395 437 460 461 476 503 506 520 544 564 567 569 575 577 577 584 628 628 632 667 698 742 804 814 818 877 977 977 1335 1028 1041 1063 1071 1084 1115 1144 1167 1192 1198 1215 1215 1245 1277 1288 1289 1301 1324 1332 1401 1405 1443 1445 1445 1449 1462 1473 1503 1554 1639 1707 1813 1833 1941 2082 2215 2436 2577 2591 2598 2598 2635 2685 2716 2743 2744 2761 2801 2928 3012 3022 3030 3031 3037 3040 3305
In [27]:
# Checking how data looks like
print("world_deceased")
world_deceased.head()
world_deceased
Out[27]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20 7/22/20 7/23/20 7/24/20 7/25/20 7/26/20 7/27/20 7/28/20 7/29/20 7/30/20 7/31/20 8/1/20 8/2/20 8/3/20 8/4/20 8/5/20 8/6/20 8/7/20 8/8/20 8/9/20 8/10/20 8/11/20 8/12/20 8/13/20 8/14/20 8/15/20 8/16/20 8/17/20 8/18/20 8/19/20 8/20/20 8/21/20 8/22/20 8/23/20 8/24/20 8/25/20 8/26/20 8/27/20 8/28/20 8/29/20 8/30/20 8/31/20 9/1/20 9/2/20 9/3/20 9/4/20 9/5/20 9/6/20 9/7/20 9/8/20 9/9/20 9/10/20 9/11/20 9/12/20 9/13/20 9/14/20 9/15/20 9/16/20 9/17/20 9/18/20 9/19/20 9/20/20 9/21/20 9/22/20 9/23/20 9/24/20 9/25/20 9/26/20 9/27/20 9/28/20 9/29/20 9/30/20 10/1/20 10/2/20 10/3/20 10/4/20 10/5/20 10/6/20 10/7/20 10/8/20 10/9/20 10/10/20 10/11/20 10/12/20 10/13/20 10/14/20 10/15/20 10/16/20 10/17/20 10/18/20 10/19/20 10/20/20 10/21/20 10/22/20
0 NaN Afghanistan 33.93911 67.709953 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 4 4 4 4 4 4 4 6 6 7 7 11 14 14 15 15 18 18 21 23 25 30 30 30 33 36 36 40 42 43 47 50 57 58 60 64 68 72 85 90 95 104 106 109 115 120 122 127 132 136 153 168 169 173 178 187 193 205 216 218 219 220 227 235 246 249 257 265 270 294 300 309 327 357 369 384 405 426 446 451 471 478 491 504 546 548 569 581 598 618 639 675 683 703 721 733 746 774 807 819 826 864 898 920 936 957 971 994 1010 1012 1048 1094 1113 1147 1164 1181 1185 1186 1190 1211 1225 1248 1259 1269 1270 1271 1271 1272 1283 1284 1288 1288 1294 1298 1307 1312 1312 1328 1344 1354 1363 1363 1370 1375 1375 1375 1375 1385 1385 1385 1387 1389 1397 1401 1401 1402 1402 1402 1402 1406 1409 1409 1409 1409 1412 1415 1418 1420 1420 1420 1420 1420 1425 1426 1436 1436 1437 1437 1441 1444 1445 1446 1451 1451 1453 1453 1455 1458 1458 1458 1458 1462 1462 1466 1467 1469 1470 1472 1473 1477 1479 1480 1481 1481 1485 1488 1492 1497 1499 1501 1505
1 NaN Albania 41.15330 20.168300 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 2 2 2 2 2 4 5 5 6 8 10 10 11 15 15 16 17 20 20 21 22 22 23 23 23 23 23 24 25 26 26 26 26 26 26 27 27 27 27 28 28 30 30 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 32 32 33 33 33 33 33 33 33 33 33 33 33 34 34 34 34 34 35 36 36 36 36 37 38 39 42 43 44 44 45 47 49 51 53 55 58 62 65 69 72 74 76 79 81 83 83 85 89 93 95 97 101 104 107 111 112 113 117 120 123 128 134 138 144 148 150 154 157 161 166 172 176 182 188 189 193 199 200 205 208 213 219 225 228 230 232 234 238 240 245 250 254 259 263 266 271 275 280 284 290 296 301 306 312 316 319 321 322 324 327 330 334 338 340 343 347 353 358 362 364 367 370 370 373 375 377 380 384 387 388 389 392 396 400 403 407 411 413 416 420 424 429 434 439 443 448 451 454 458 462 465
2 NaN Algeria 28.03390 1.659600 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 3 4 4 4 7 9 11 15 17 17 19 21 25 26 29 31 35 44 58 86 105 130 152 173 193 205 235 256 275 293 313 326 336 348 364 367 375 384 392 402 407 415 419 425 432 437 444 450 453 459 463 465 470 476 483 488 494 502 507 515 522 529 536 542 548 555 561 568 575 582 592 600 609 617 623 630 638 646 653 661 667 673 681 690 698 707 715 724 732 741 751 760 767 777 788 799 811 825 837 845 852 861 869 878 885 892 897 905 912 920 928 937 946 952 959 968 978 988 996 1004 1011 1018 1028 1040 1052 1057 1068 1078 1087 1100 1111 1124 1136 1146 1155 1163 1174 1186 1200 1210 1223 1231 1239 1248 1261 1273 1282 1293 1302 1312 1322 1333 1341 1351 1360 1370 1379 1391 1402 1411 1418 1424 1435 1446 1456 1465 1475 1483 1491 1501 1510 1518 1523 1529 1539 1549 1556 1562 1571 1581 1591 1599 1605 1612 1620 1632 1645 1654 1659 1665 1672 1679 1689 1698 1703 1707 1711 1714 1719 1726 1736 1741 1749 1756 1760 1768 1768 1771 1783 1789 1795 1801 1809 1818 1827 1827 1841 1846 1856 1865 1873 1880 1888
3 NaN Andorra 42.50630 1.521800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 3 3 3 6 8 12 14 15 16 17 18 21 22 23 25 26 26 29 29 31 33 33 35 35 36 37 37 37 37 40 40 40 40 41 42 42 43 44 45 45 46 46 47 47 48 48 48 48 49 49 49 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 51 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 52 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 53 54 55 55 55 57 57 59 59 59 59 59 62 62 63 63
4 NaN Angola -11.20270 17.873900 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 6 6 6 6 7 8 8 9 9 10 10 10 10 10 10 11 11 13 15 17 18 19 19 19 21 21 22 23 23 26 26 26 27 28 29 29 29 29 30 33 33 35 39 40 41 47 48 51 52 54 55 58 59 62 64 67 70 75 78 80 80 80 86 86 88 88 90 92 93 94 94 96 100 102 103 105 106 107 107 108 109 112 113 115 117 117 120 124 126 130 131 132 134 136 139 143 144 147 147 152 154 155 159 162 167 171 174 176 179 183 185 189 193 195 199 211 211 208 212 218 218 219 222 227 228 234 241 247 248 251 255 260
In [28]:
# Checking how data looks like
print("daily_report")
daily_report.head()
daily_report
Out[28]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
0 NaN NaN NaN Afghanistan 2020-10-23 04:24:46 33.93911 67.709953 40626 1505 33831 5290.0 Afghanistan 104.360985 3.704524
1 NaN NaN NaN Albania 2020-10-23 04:24:46 41.15330 20.168300 18250 465 10395 7390.0 Albania 634.164987 2.547945
2 NaN NaN NaN Algeria 2020-10-23 04:24:46 28.03390 1.659600 55357 1888 38618 14851.0 Algeria 126.238731 3.410589
3 NaN NaN NaN Andorra 2020-10-23 04:24:46 42.50630 1.521800 3811 63 2470 1278.0 Andorra 4932.375591 1.653109
4 NaN NaN NaN Angola 2020-10-23 04:24:46 -11.20270 17.873900 8582 260 3305 5017.0 Angola 26.111879 3.029597
In [29]:
# Checking the Countries that are associated to more than one entry in the time series
print(world_confirmed['Country/Region'].value_counts()[world_confirmed['Country/Region']\
                                                    .value_counts() > 1].to_string())
China             33
Canada            14
France            11
United Kingdom    11
Australia          8
Netherlands        5
Denmark            3
In [30]:
# Checking the Countries that are associated to more than one entry in the daily report
print("Countries that are associated to more than one entry and number of entries\n")
print(daily_report['Country_Region'].value_counts()[daily_report['Country_Region']\
                                                    .value_counts() > 1].to_string())
Countries that are associated to more than one entry and number of entries

US                3272
Russia              83
Japan               49
India               37
Colombia            34
China               33
Mexico              32
Brazil              27
Ukraine             27
Peru                26
Sweden              21
Italy               21
Spain               20
Chile               17
Netherlands         17
Germany             17
United Kingdom      15
Canada              14
France              11
Australia            8
Pakistan             7
Denmark              3
In [31]:
# Checking the logic behind the classification in the time series
world_confirmed[world_confirmed['Country/Region'] == "Denmark"]
Out[31]:
Province/State Country/Region Lat Long 1/22/20 1/23/20 1/24/20 1/25/20 1/26/20 1/27/20 1/28/20 1/29/20 1/30/20 1/31/20 2/1/20 2/2/20 2/3/20 2/4/20 2/5/20 2/6/20 2/7/20 2/8/20 2/9/20 2/10/20 2/11/20 2/12/20 2/13/20 2/14/20 2/15/20 2/16/20 2/17/20 2/18/20 2/19/20 2/20/20 2/21/20 2/22/20 2/23/20 2/24/20 2/25/20 2/26/20 2/27/20 2/28/20 2/29/20 3/1/20 3/2/20 3/3/20 3/4/20 3/5/20 3/6/20 3/7/20 3/8/20 3/9/20 3/10/20 3/11/20 3/12/20 3/13/20 3/14/20 3/15/20 3/16/20 3/17/20 3/18/20 3/19/20 3/20/20 3/21/20 3/22/20 3/23/20 3/24/20 3/25/20 3/26/20 3/27/20 3/28/20 3/29/20 3/30/20 3/31/20 4/1/20 4/2/20 4/3/20 4/4/20 4/5/20 4/6/20 4/7/20 4/8/20 4/9/20 4/10/20 4/11/20 4/12/20 4/13/20 4/14/20 4/15/20 4/16/20 4/17/20 4/18/20 4/19/20 4/20/20 4/21/20 4/22/20 4/23/20 4/24/20 4/25/20 4/26/20 4/27/20 4/28/20 4/29/20 4/30/20 5/1/20 5/2/20 5/3/20 5/4/20 5/5/20 5/6/20 5/7/20 5/8/20 5/9/20 5/10/20 5/11/20 5/12/20 5/13/20 5/14/20 5/15/20 5/16/20 5/17/20 5/18/20 5/19/20 5/20/20 5/21/20 5/22/20 5/23/20 5/24/20 5/25/20 5/26/20 5/27/20 5/28/20 5/29/20 5/30/20 5/31/20 6/1/20 6/2/20 6/3/20 6/4/20 6/5/20 6/6/20 6/7/20 6/8/20 6/9/20 6/10/20 6/11/20 6/12/20 6/13/20 6/14/20 6/15/20 6/16/20 6/17/20 6/18/20 6/19/20 6/20/20 6/21/20 6/22/20 6/23/20 6/24/20 6/25/20 6/26/20 6/27/20 6/28/20 6/29/20 6/30/20 7/1/20 7/2/20 7/3/20 7/4/20 7/5/20 7/6/20 7/7/20 7/8/20 7/9/20 7/10/20 7/11/20 7/12/20 7/13/20 7/14/20 7/15/20 7/16/20 7/17/20 7/18/20 7/19/20 7/20/20 7/21/20 7/22/20 7/23/20 7/24/20 7/25/20 7/26/20 7/27/20 7/28/20 7/29/20 7/30/20 7/31/20 8/1/20 8/2/20 8/3/20 8/4/20 8/5/20 8/6/20 8/7/20 8/8/20 8/9/20 8/10/20 8/11/20 8/12/20 8/13/20 8/14/20 8/15/20 8/16/20 8/17/20 8/18/20 8/19/20 8/20/20 8/21/20 8/22/20 8/23/20 8/24/20 8/25/20 8/26/20 8/27/20 8/28/20 8/29/20 8/30/20 8/31/20 9/1/20 9/2/20 9/3/20 9/4/20 9/5/20 9/6/20 9/7/20 9/8/20 9/9/20 9/10/20 9/11/20 9/12/20 9/13/20 9/14/20 9/15/20 9/16/20 9/17/20 9/18/20 9/19/20 9/20/20 9/21/20 9/22/20 9/23/20 9/24/20 9/25/20 9/26/20 9/27/20 9/28/20 9/29/20 9/30/20 10/1/20 10/2/20 10/3/20 10/4/20 10/5/20 10/6/20 10/7/20 10/8/20 10/9/20 10/10/20 10/11/20 10/12/20 10/13/20 10/14/20 10/15/20 10/16/20 10/17/20 10/18/20 10/19/20 10/20/20 10/21/20 10/22/20
99 Faroe Islands Denmark 61.8926 -6.9118 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 2 2 2 2 2 3 9 11 18 47 58 72 80 92 115 118 122 132 140 144 155 159 168 169 173 177 179 181 181 183 184 184 184 184 184 184 184 184 184 184 184 184 185 185 185 185 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 187 188 188 188 188 188 188 188 188 188 188 188 188 188 188 191 191 191 191 191 191 192 214 214 220 220 225 225 225 225 225 227 241 266 291 295 303 306 318 339 362 365 370 372 373 377 382 383 384 384 384 410 411 411 411 411 411 411 411 411 411 412 413 413 413 413 414 415 415 416 418 423 423 428 428 429 430 431 434 437 448 451 455 458 460 460 460 463 467 472 472 473 474 475 475 476 477 477 477 477 477 477 478 480 482 483 485 485 488 488 490
100 Greenland Denmark 71.7069 -42.6043 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 4 4 5 6 6 10 10 10 10 10 10 10 10 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 12 12 12 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 13 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 14 15 15 16 16 16 16 16 16 16 16 16 16 16 16 16 17
101 NaN Denmark 56.2639 9.5018 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 3 4 4 6 10 10 23 23 35 90 262 442 615 801 827 864 914 977 1057 1151 1255 1326 1395 1450 1591 1724 1877 2046 2201 2395 2577 2860 3107 3386 3757 4077 4369 4681 5071 5402 5635 5819 5996 6174 6318 6511 6681 6879 7073 7242 7384 7515 7695 7912 8073 8210 8445 8575 8698 8851 9008 9158 9311 9407 9523 9670 9821 9938 10083 10218 10319 10429 10513 10591 10667 10713 10791 10858 10927 10968 11044 11117 11182 11230 11289 11360 11387 11428 11480 11512 11593 11633 11669 11699 11734 11771 11811 11875 11924 11948 11962 12001 12016 12035 12099 12139 12193 12217 12250 12294 12344 12391 12391 12391 12527 12561 12615 12636 12675 12675 12675 12751 12768 12794 12815 12832 12832 12832 12878 12888 12900 12916 12946 12946 12946 13037 13061 13092 13124 13173 13173 13173 13262 13302 13350 13390 13438 13438 13438 13547 13577 13634 13725 13789 13789 13789 13996 14073 14185 14306 14442 14442 14442 14815 14959 15070 15214 15379 15483 15617 15740 15855 15940 16056 16127 16239 16317 16397 16480 16537 16627 16700 16779 16891 16985 17084 17195 17374 17547 17736 17883 18113 18356 18607 18924 19216 19557 19890 20237 20571 20571 21393 21847 22436 22905 23323 23799 24357 24916 25594 26213 26637 27072 27464 27998 28396 28932 29302 29680 30057 30379 30710 31156 31638 32082 32422 32811 33101 33593 34023 34441 34941 35392 35844 36373 37003 37763
In [32]:
# Checking the logic behind the classification in the daily report
daily_report[daily_report['Country_Region'] == "Denmark"]
Out[32]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
174 NaN NaN Faroe Islands Denmark 2020-10-23 04:24:46 61.8926 -6.9118 490 0 473 17.0 Faroe Islands, Denmark 1002.762714 0.000000
175 NaN NaN Greenland Denmark 2020-10-23 04:24:46 71.7069 -42.6043 17 0 16 1.0 Greenland, Denmark 29.944339 0.000000
176 NaN NaN NaN Denmark 2020-10-23 04:24:46 56.2639 9.5018 37763 694 30877 6192.0 Denmark 651.962647 1.837778
In [33]:
# Checking the logic behind the classification in the daily report
daily_report[daily_report['Country_Region'] == "Italy"]
Out[33]:
FIPS Admin2 Province_State Country_Region Last_Update Lat Long_ Confirmed Deaths Recovered Active Combined_Key Incidence_Rate Case-Fatality_Ratio
276 NaN NaN Abruzzo Italy 2020-10-23 04:24:46 42.351222 13.398438 7091 501 3214 3376.0 Abruzzo, Italy 540.645634 7.065294
277 NaN NaN Basilicata Italy 2020-10-23 04:24:46 40.639471 15.805148 1481 42 611 828.0 Basilicata, Italy 263.116285 2.835922
278 NaN NaN Calabria Italy 2020-10-23 04:24:46 38.905976 16.594402 3286 105 1644 1537.0 Calabria, Italy 168.761116 3.195374
279 NaN NaN Campania Italy 2020-10-23 04:24:46 40.839566 14.250850 32025 551 8913 22561.0 Campania, Italy 551.994142 1.720531
280 NaN NaN Emilia-Romagna Italy 2020-10-23 04:24:46 44.494367 11.341721 43477 4537 27277 11663.0 Emilia-Romagna, Italy 974.934953 10.435403
281 NaN NaN Friuli Venezia Giulia Italy 2020-10-23 04:24:46 45.649435 13.768136 7075 368 4482 2225.0 Friuli Venezia Giulia, Italy 582.199108 5.201413
282 NaN NaN Lazio Italy 2020-10-23 04:24:46 41.892770 12.483667 29621 1070 10020 18531.0 Lazio, Italy 503.837164 3.612302
283 NaN NaN Liguria Italy 2020-10-23 04:24:46 44.411493 8.932699 20581 1673 13842 5066.0 Liguria, Italy 1327.258422 8.128857
284 NaN NaN Lombardia Italy 2020-10-23 04:24:46 45.466794 9.190347 138729 17152 88059 33518.0 Lombardia, Italy 1378.937226 12.363673
285 NaN NaN Marche Italy 2020-10-23 04:24:46 43.616760 13.518875 10192 998 6570 2624.0 Marche, Italy 668.209125 9.791994
286 NaN NaN Molise Italy 2020-10-23 04:24:46 41.557748 14.659161 1070 26 594 450.0 Molise, Italy 350.111414 2.429907
287 NaN NaN P.A. Bolzano Italy 2020-10-23 04:24:46 46.499335 11.356624 5549 296 2935 2318.0 P.A. Bolzano, Italy 1042.422011 5.334294
288 NaN NaN P.A. Trento Italy 2020-10-23 04:24:46 46.068935 11.121231 7319 423 5927 969.0 P.A. Trento, Italy 1351.820590 5.779478
289 NaN NaN Piemonte Italy 2020-10-23 04:24:46 45.073274 7.680687 49668 4227 30661 14780.0 Piemonte, Italy 1140.114122 8.510510
290 NaN NaN Puglia Italy 2020-10-23 04:24:46 41.125596 16.867367 12810 645 5886 6279.0 Puglia, Italy 317.940717 5.035129
291 NaN NaN Sardegna Italy 2020-10-23 04:24:46 39.215312 9.110616 6886 181 2712 3993.0 Sardegna, Italy 419.982788 2.628522
292 NaN NaN Sicilia Italy 2020-10-23 04:24:46 38.115697 13.362357 14586 397 5649 8540.0 Sicilia, Italy 291.726360 2.721788
293 NaN NaN Toscana Italy 2020-10-23 04:24:46 43.769231 11.255889 26611 1229 12121 13261.0 Toscana, Italy 713.500307 4.618391
294 NaN NaN Umbria Italy 2020-10-23 04:24:46 43.106758 12.388247 5860 97 2493 3270.0 Umbria, Italy 664.387794 1.655290
295 NaN NaN Valle d'Aosta Italy 2020-10-23 04:24:46 45.737503 7.320149 2219 149 1165 905.0 Valle d'Aosta, Italy 1765.791861 6.714736
296 NaN NaN Veneto Italy 2020-10-23 04:24:46 45.434905 12.338452 39590 2301 24681 12608.0 Veneto, Italy 806.995072 5.812074

Multiple entries in the daily reports are due both to the existence of offshore territories (as for Denmark) and to the breaking the data of certain Countries into different areas (as for Italy).

In the time series the following applies.

For France, Netherlands and Denmark, in order to get the data related to the main land it is enough to search for Country_Region = countryname and Province_State = NaN.

The same can be done for the United Kingdom. This excludes the Isle of Man and Channel Islands.

For Australia, it is enough to sum up all the entries where Country_Region = countryname. This includes Tasmania.

The same can be done for China and this will include also Hainan and Hong Kong.

For Canada, summing all the entries include also the people from Diamond Princess and Grand Princes ships, we well as Prince Edward Island population.

In [34]:
print("Population of different Countries in million (source: Google):\n\n",
     "Italy:", italy_pop, "\n",
     "Spain:", spain_pop, "\n",
     "Germany:", germany_pop, "\n",
     "France:", france_pop, "\n",
     "Switzerland:", switzerland_pop, "\n",
     "Netherlands:", netherlands_pop, "\n",
     "Austria:", austria_pop, "\n",
     "Belgium:", belgium_pop, "\n",
     "Portugal:", portugal_pop, "\n",
     "Luxembourg:", luxembourg_pop, "\n",
     "Poland:", poland_pop, "\n",
     "Ireland:", ireland_pop, "\n",
     "Estonia:", estonia_pop, "\n",
     "Denmark:", denmark_pop, "\n",
     "Norway:", norway_pop, "\n",
     "Sweden:", sweden_pop, "\n",
     "Iceland:", iceland_pop, "\n",
     "Finland:", finland_pop, "\n",
     "UK:", uk_pop, "\n",
     "Brazil:", brazil_pop, "\n",
     "Russia:", russia_pop, "\n",
     "India:", india_pop, "\n")

print("NOTE: those figures are approximative.")
Population of different Countries in million (source: Google):

 Italy: 60.48 
 Spain: 46.66 
 Germany: 82.79 
 France: 66.99 
 Switzerland: 8.57 
 Netherlands: 17.18 
 Austria: 8.822 
 Belgium: 11.4 
 Portugal: 10.29 
 Luxembourg: 0.602 
 Poland: 37.97 
 Ireland: 4.904 
 Estonia: 1.328 
 Denmark: 5.603 
 Norway: 5.368 
 Sweden: 10.12 
 Iceland: 0.364 
 Finland: 5.513 
 UK: 66.44 
 Brazil: 212.559 
 Russia: 145.9 
 India: 1380 

NOTE: those figures are approximative.
In [35]:
print("Density of population of different Countries in people per square kilometre\n"\
      "(source: Google):\n\n",
     "Italy:", italy_dens, "\n",
     "Spain:", spain_dens, "\n",
     "Germany:", germany_dens, "\n",
     "France:", france_dens, "\n",
     "Switzerland:", switzerland_dens, "\n",
     "Netherlands:", netherlands_dens, "\n",
     "Austria:", austria_dens, "\n",
     "Belgium:", belgium_dens, "\n",
     "Portugal:", portugal_dens, "\n",
     "Luxembourg:", luxembourg_dens, "\n",
     "Poland:", poland_dens, "\n",
     "Ireland:", ireland_dens, "\n",
     "Estonia:", poland_dens, "\n",
     "Denmark:", denmark_dens, "\n",
     "Norway:", norway_dens, "\n",
     "Sweden:", sweden_dens, "\n",
     "Iceland:", iceland_dens, "\n",
     "Finland:", finland_dens, "\n",
     "UK:", uk_dens, "\n",
     "Brazil:", brazil_dens, "\n",
     "Russia:", russia_dens, "\n",
     "India:", india_dens, "\n")

print("NOTE: those figures are approximative.")
Density of population of different Countries in people per square kilometre
(source: Google):

 Italy: 201.3 
 Spain: 91.4 
 Germany: 240 
 France: 122.34 
 Switzerland: 219 
 Netherlands: 488 
 Austria: 109 
 Belgium: 383 
 Portugal: 111 
 Luxembourg: 242 
 Poland: 124 
 Ireland: 72 
 Estonia: 124 
 Denmark: 134 
 Norway: 15 
 Sweden: 25 
 Iceland: 3 
 Finland: 15 
 UK: 274 
 Brazil: 25 
 Russia: 8.54 
 India: 464 

NOTE: those figures are approximative.
In [36]:
print("Median age of different Countries (source: Wikipedia):\n\n",
      "Finland:", finland_median_age, "\n",
      "Denmark:", denmark_median_age, "\n",
      "Norwayd:", norway_median_age, "\n",
      "Sweden:", sweden_median_age, "\n",
      "Iceland:", iceland_median_age, "\n",
      "Italy:", italy_median_age, "\n",
      "Spain:", spain_median_age, "\n",
      "France:", france_median_age, "\n",
      "Switzerland:", switzerland_median_age, "\n",
      "Netherlands:", netherlands_median_age, "\n",
      "Austria:", austria_median_age, "\n",
      "Belgium:", belgium_median_age, "\n",
      "Portugal:", portugal_median_age, "\n",
      "Luxembourg:", luxembourg_median_age, "\n",
      "Polandd:", poland_median_age, "\n",
      "Ireland:", ireland_median_age, "\n",
      "Estonia:", estonia_median_age, "\n",
      "Brazil:", brazil_median_age, "\n",
      "Russia:", russia_median_age, "\n",
      "India:", india_median_age, "\n")
      
print("NOTE: those figures are from year 2018.")
Median age of different Countries (source: Wikipedia):

 Finland: 42.5 
 Denmark: 42.2 
 Norwayd: 39.2 
 Sweden: 41.2 
 Iceland: 36.5 
 Italy: 45.5 
 Spain: 42.7 
 France: 41.4 
 Switzerland: 42.4 
 Netherlands: 42.6 
 Austria: 44.0 
 Belgium: 41.4 
 Portugal: 42.2 
 Luxembourg: 39.3 
 Polandd: 39.7 
 Ireland: 36.5 
 Estonia: 41.6 
 Brazil: 31.4 
 Russia: 38.6 
 India: 26.8 

NOTE: those figures are from year 2018.
In [37]:
pd.options.display.max_colwidth = 150

print("Containment actions by the Finnish Government:\n")
# Setting both text and column headers text aligned to the left
# and omitting the indexes
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Containment actions by the Finnish Government:

Out[37]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency

5.3. Data Cleansing

In [38]:
# Fixing the errors in the original data in the Active column
daily_report['Active'] = daily_report['Confirmed'] - daily_report['Deaths'] - daily_report['Recovered']
In [39]:
# Converting null values in strings with value "Not applicable"
world_conf_clean = world_confirmed.fillna("Not applicable")
world_recov_clean = world_recovered.fillna("Not applicable")
world_deceas_clean = world_deceased.fillna("Not applicable")
daily_rep_clean = daily_report.fillna("Not applicable")

5.4. Data Preparation

5.4.1. New datasets with no NaN, no GPS coordinates / list of days / list of Countries

In [40]:
# Dropping the GPS coordinates and storing the result in new datasets
world_conf_short = world_conf_clean.drop(['Lat', 'Long'], axis=1)
world_recov_short = world_recov_clean.drop(['Lat', 'Long'], axis=1)
world_deceas_short = world_deceas_clean.drop(['Lat', 'Long'], axis=1)
# Dropping the columns not related to the cases counters
daily_rep_short = daily_rep_clean.drop(['Lat',
                                        'Long_',
                                        'Last_Update',
                                        'FIPS',
                                        'Admin2',
                                        'Combined_Key'],\
                                       axis=1)

# Grouping by Province/State and storing the result in new datasets
world_conf_group = world_conf_short.groupby(['Country/Region']).sum()
world_recov_group = world_recov_short.groupby(['Country/Region']).sum()
world_deceas_group = world_deceas_short.groupby(['Country/Region']).sum()
daily_rep_group = daily_rep_short.groupby(['Country_Region']).sum()
In [41]:
# Creating a list of dates

# Extracting only the columns containing the virus cases data for each day
world_conf_data = world_confirmed.iloc[:,4:]
# Extracting the column values (dates) and putting them in a list
days_all = world_conf_data.columns.values.tolist()

# Initializing an empty list
days_tot = []
# Looping through the number of days
for i in list(range(len(days_all))):
    # Extracting day and month and taking just the string value
    new_element=re.findall("[0-9]+[/][0-9]+", days_all[i])[0]
    # Adding the result to the list
    days_tot.append(new_element)
    
print("List of days for the plots:\n")
days_tot
List of days for the plots:

Out[41]:
['1/22',
 '1/23',
 '1/24',
 '1/25',
 '1/26',
 '1/27',
 '1/28',
 '1/29',
 '1/30',
 '1/31',
 '2/1',
 '2/2',
 '2/3',
 '2/4',
 '2/5',
 '2/6',
 '2/7',
 '2/8',
 '2/9',
 '2/10',
 '2/11',
 '2/12',
 '2/13',
 '2/14',
 '2/15',
 '2/16',
 '2/17',
 '2/18',
 '2/19',
 '2/20',
 '2/21',
 '2/22',
 '2/23',
 '2/24',
 '2/25',
 '2/26',
 '2/27',
 '2/28',
 '2/29',
 '3/1',
 '3/2',
 '3/3',
 '3/4',
 '3/5',
 '3/6',
 '3/7',
 '3/8',
 '3/9',
 '3/10',
 '3/11',
 '3/12',
 '3/13',
 '3/14',
 '3/15',
 '3/16',
 '3/17',
 '3/18',
 '3/19',
 '3/20',
 '3/21',
 '3/22',
 '3/23',
 '3/24',
 '3/25',
 '3/26',
 '3/27',
 '3/28',
 '3/29',
 '3/30',
 '3/31',
 '4/1',
 '4/2',
 '4/3',
 '4/4',
 '4/5',
 '4/6',
 '4/7',
 '4/8',
 '4/9',
 '4/10',
 '4/11',
 '4/12',
 '4/13',
 '4/14',
 '4/15',
 '4/16',
 '4/17',
 '4/18',
 '4/19',
 '4/20',
 '4/21',
 '4/22',
 '4/23',
 '4/24',
 '4/25',
 '4/26',
 '4/27',
 '4/28',
 '4/29',
 '4/30',
 '5/1',
 '5/2',
 '5/3',
 '5/4',
 '5/5',
 '5/6',
 '5/7',
 '5/8',
 '5/9',
 '5/10',
 '5/11',
 '5/12',
 '5/13',
 '5/14',
 '5/15',
 '5/16',
 '5/17',
 '5/18',
 '5/19',
 '5/20',
 '5/21',
 '5/22',
 '5/23',
 '5/24',
 '5/25',
 '5/26',
 '5/27',
 '5/28',
 '5/29',
 '5/30',
 '5/31',
 '6/1',
 '6/2',
 '6/3',
 '6/4',
 '6/5',
 '6/6',
 '6/7',
 '6/8',
 '6/9',
 '6/10',
 '6/11',
 '6/12',
 '6/13',
 '6/14',
 '6/15',
 '6/16',
 '6/17',
 '6/18',
 '6/19',
 '6/20',
 '6/21',
 '6/22',
 '6/23',
 '6/24',
 '6/25',
 '6/26',
 '6/27',
 '6/28',
 '6/29',
 '6/30',
 '7/1',
 '7/2',
 '7/3',
 '7/4',
 '7/5',
 '7/6',
 '7/7',
 '7/8',
 '7/9',
 '7/10',
 '7/11',
 '7/12',
 '7/13',
 '7/14',
 '7/15',
 '7/16',
 '7/17',
 '7/18',
 '7/19',
 '7/20',
 '7/21',
 '7/22',
 '7/23',
 '7/24',
 '7/25',
 '7/26',
 '7/27',
 '7/28',
 '7/29',
 '7/30',
 '7/31',
 '8/1',
 '8/2',
 '8/3',
 '8/4',
 '8/5',
 '8/6',
 '8/7',
 '8/8',
 '8/9',
 '8/10',
 '8/11',
 '8/12',
 '8/13',
 '8/14',
 '8/15',
 '8/16',
 '8/17',
 '8/18',
 '8/19',
 '8/20',
 '8/21',
 '8/22',
 '8/23',
 '8/24',
 '8/25',
 '8/26',
 '8/27',
 '8/28',
 '8/29',
 '8/30',
 '8/31',
 '9/1',
 '9/2',
 '9/3',
 '9/4',
 '9/5',
 '9/6',
 '9/7',
 '9/8',
 '9/9',
 '9/10',
 '9/11',
 '9/12',
 '9/13',
 '9/14',
 '9/15',
 '9/16',
 '9/17',
 '9/18',
 '9/19',
 '9/20',
 '9/21',
 '9/22',
 '9/23',
 '9/24',
 '9/25',
 '9/26',
 '9/27',
 '9/28',
 '9/29',
 '9/30',
 '10/1',
 '10/2',
 '10/3',
 '10/4',
 '10/5',
 '10/6',
 '10/7',
 '10/8',
 '10/9',
 '10/10',
 '10/11',
 '10/12',
 '10/13',
 '10/14',
 '10/15',
 '10/16',
 '10/17',
 '10/18',
 '10/19',
 '10/20',
 '10/21',
 '10/22']
In [42]:
# Listing the Countries
print("List of Countries:\n")
world_conf_group.index.to_list()
List of Countries:

Out[42]:
['Afghanistan',
 'Albania',
 'Algeria',
 'Andorra',
 'Angola',
 'Antigua and Barbuda',
 'Argentina',
 'Armenia',
 'Australia',
 'Austria',
 'Azerbaijan',
 'Bahamas',
 'Bahrain',
 'Bangladesh',
 'Barbados',
 'Belarus',
 'Belgium',
 'Belize',
 'Benin',
 'Bhutan',
 'Bolivia',
 'Bosnia and Herzegovina',
 'Botswana',
 'Brazil',
 'Brunei',
 'Bulgaria',
 'Burkina Faso',
 'Burma',
 'Burundi',
 'Cabo Verde',
 'Cambodia',
 'Cameroon',
 'Canada',
 'Central African Republic',
 'Chad',
 'Chile',
 'China',
 'Colombia',
 'Comoros',
 'Congo (Brazzaville)',
 'Congo (Kinshasa)',
 'Costa Rica',
 "Cote d'Ivoire",
 'Croatia',
 'Cuba',
 'Cyprus',
 'Czechia',
 'Denmark',
 'Diamond Princess',
 'Djibouti',
 'Dominica',
 'Dominican Republic',
 'Ecuador',
 'Egypt',
 'El Salvador',
 'Equatorial Guinea',
 'Eritrea',
 'Estonia',
 'Eswatini',
 'Ethiopia',
 'Fiji',
 'Finland',
 'France',
 'Gabon',
 'Gambia',
 'Georgia',
 'Germany',
 'Ghana',
 'Greece',
 'Grenada',
 'Guatemala',
 'Guinea',
 'Guinea-Bissau',
 'Guyana',
 'Haiti',
 'Holy See',
 'Honduras',
 'Hungary',
 'Iceland',
 'India',
 'Indonesia',
 'Iran',
 'Iraq',
 'Ireland',
 'Israel',
 'Italy',
 'Jamaica',
 'Japan',
 'Jordan',
 'Kazakhstan',
 'Kenya',
 'Korea, South',
 'Kosovo',
 'Kuwait',
 'Kyrgyzstan',
 'Laos',
 'Latvia',
 'Lebanon',
 'Lesotho',
 'Liberia',
 'Libya',
 'Liechtenstein',
 'Lithuania',
 'Luxembourg',
 'MS Zaandam',
 'Madagascar',
 'Malawi',
 'Malaysia',
 'Maldives',
 'Mali',
 'Malta',
 'Mauritania',
 'Mauritius',
 'Mexico',
 'Moldova',
 'Monaco',
 'Mongolia',
 'Montenegro',
 'Morocco',
 'Mozambique',
 'Namibia',
 'Nepal',
 'Netherlands',
 'New Zealand',
 'Nicaragua',
 'Niger',
 'Nigeria',
 'North Macedonia',
 'Norway',
 'Oman',
 'Pakistan',
 'Panama',
 'Papua New Guinea',
 'Paraguay',
 'Peru',
 'Philippines',
 'Poland',
 'Portugal',
 'Qatar',
 'Romania',
 'Russia',
 'Rwanda',
 'Saint Kitts and Nevis',
 'Saint Lucia',
 'Saint Vincent and the Grenadines',
 'San Marino',
 'Sao Tome and Principe',
 'Saudi Arabia',
 'Senegal',
 'Serbia',
 'Seychelles',
 'Sierra Leone',
 'Singapore',
 'Slovakia',
 'Slovenia',
 'Solomon Islands',
 'Somalia',
 'South Africa',
 'South Sudan',
 'Spain',
 'Sri Lanka',
 'Sudan',
 'Suriname',
 'Sweden',
 'Switzerland',
 'Syria',
 'Taiwan*',
 'Tajikistan',
 'Tanzania',
 'Thailand',
 'Timor-Leste',
 'Togo',
 'Trinidad and Tobago',
 'Tunisia',
 'Turkey',
 'US',
 'Uganda',
 'Ukraine',
 'United Arab Emirates',
 'United Kingdom',
 'Uruguay',
 'Uzbekistan',
 'Venezuela',
 'Vietnam',
 'West Bank and Gaza',
 'Western Sahara',
 'Yemen',
 'Zambia',
 'Zimbabwe']

5.4.2. Population age data

In [43]:
# Creating a Pandas series containing median ages for different Countries
countries_median_age = pd.Series({'Finland': finland_median_age,
                                  'Denmark': denmark_median_age,
                                  'Norway': norway_median_age,
                                  'Sweden': sweden_median_age,
                                  'Iceland': iceland_median_age,
                                  'Italy': italy_median_age,
                                  'Spain': spain_median_age,
                                  'Germany': germany_median_age,
                                  'France': france_median_age,
                                  'Switzerland': switzerland_median_age,
                                  'Netherlands': netherlands_median_age,
                                  'Austria': austria_median_age,
                                  'Belgium': belgium_median_age,
                                  'Portugal': portugal_median_age,
                                  'Luxembourg': luxembourg_median_age,
                                  'Poland': poland_median_age,
                                  'Ireland': ireland_median_age,
                                  'Estonia': estonia_median_age,
                                  'UK': uk_median_age,
                                  'US': us_median_age,
                                  'Brazil': brazil_median_age,
                                  'Russia': russia_median_age,
                                  'India': india_median_age})
# Calculating the minimum value
median_age_min = countries_median_age.min()
# Calculating the maximum value
median_age_max = countries_median_age.max()
# Calculating the median age range
median_age_range = median_age_max - median_age_min
print("The range of the median age in the Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range))

# Extracting data related to Scandinavia
scand_median_age = countries_median_age[['Finland', 'Denmark', 'Norway', 'Sweden', 'Iceland']]
# Calculating the median age range
scand_median_age_range = scand_median_age.max() - scand_median_age.min()
print("The range of the median age in the Scandinavian Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(scand_median_age_range))

# Extracting data related to EU
eu_median_age = countries_median_age[['Finland', 'Denmark', 'Sweden',
                                      'Italy', 'Spain', 'Germany', 'France',
                                      'Netherlands', 'Austria', 'Belgium', 'Portugal',
                                      'Luxembourg', 'Poland', 'Ireland', 'Estonia']]
# Calculating the median age range
eu_median_age_range = eu_median_age.max() - eu_median_age.min()
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(eu_median_age_range))
The range of the median age in the Countries that are analyzed here is: 18.9 years
The range of the median age in the Scandinavian Countries that are analyzed here is: 6.0 years
The range of the median age in the EU Countries that are analyzed here is: 9.2 years

5.4.3. World Data

In [44]:
# Selecting only the columns with the daily data
world_conf = world_conf_short.iloc[:,2:]
world_recov = world_recov_short.iloc[:,2:]
world_deceas = world_deceas_short.iloc[:,2:]
In [45]:
# Calculating cumulative worldwide data for each day
world_conf_tot = world_conf.sum()
world_recov_tot = world_recov.sum()
world_deceas_tot = world_deceas.sum()
In [46]:
# Calculating the active cases for each day
world_act_tot = list(np.array(world_conf_tot) - \
                     np.array(world_recov_tot) - \
                     np.array(world_deceas_tot))
In [47]:
# Calculating the daily increments in the deceased cases
world_conf_incr = calc_increments(world_conf_tot)
# Calculating the daily increments in the confirmed cases
world_deceas_incr = calc_increments(world_deceas_tot)
In [48]:
# Finding the cumulative per capita data worldwide
world_conf_perc = pop_perc(world_conf_tot, 7.8*1000)
world_deceas_perc = pop_perc(world_deceas_tot, 7.8*1000)

5.4.4. Finnish data

In [49]:
# Calling the function extract_country to extract data related to Finland
# (skipping the first 6 days since they contain no confirmed cases)
Finland_6 = extract_country("Finland", "Not applicable", 6)
# Extracting the confirmed cases
finland_conf_6 = Finland_6[0]
# Extracting the recovered cases
finland_recov_6 = Finland_6[1]
# Extracting the decased cases
finland_deceas_6 = Finland_6[2]

# Creating a list of days to use for Finnish charts
# (skipping the first 6 days)
days_fin = days_tot[6:]
In [50]:
print("Compact Finnish data set:\n")
print("first day:", days_fin[0])
print("number of days:", len(days_fin))
Compact Finnish data set:

first day: 1/28
number of days: 269
In [51]:
# Visualizing the complete series
print("Confirmed cases time series:")
finland_conf_6
Confirmed cases time series:
Out[51]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 2,
 2,
 2,
 3,
 6,
 6,
 6,
 6,
 12,
 15,
 15,
 23,
 30,
 40,
 59,
 59,
 155,
 225,
 244,
 277,
 321,
 336,
 400,
 450,
 523,
 626,
 700,
 792,
 880,
 958,
 1041,
 1167,
 1240,
 1352,
 1418,
 1446,
 1518,
 1615,
 1882,
 1927,
 2176,
 2308,
 2487,
 2605,
 2769,
 2905,
 2974,
 3064,
 3161,
 3237,
 3369,
 3489,
 3681,
 3783,
 3868,
 4014,
 4129,
 4284,
 4395,
 4475,
 4576,
 4695,
 4740,
 4906,
 4995,
 5051,
 5176,
 5254,
 5327,
 5412,
 5573,
 5673,
 5738,
 5880,
 5962,
 5984,
 6003,
 6054,
 6145,
 6228,
 6286,
 6347,
 6380,
 6399,
 6443,
 6493,
 6537,
 6568,
 6579,
 6599,
 6628,
 6692,
 6743,
 6776,
 6826,
 6859,
 6885,
 6887,
 6911,
 6911,
 6941,
 6964,
 6981,
 7001,
 7025,
 7040,
 7064,
 7073,
 7087,
 7104,
 7108,
 7112,
 7117,
 7119,
 7133,
 7142,
 7143,
 7144,
 7155,
 7167,
 7172,
 7191,
 7198,
 7198,
 7209,
 7214,
 7236,
 7241,
 7242,
 7248,
 7253,
 7257,
 7262,
 7265,
 7273,
 7279,
 7291,
 7294,
 7295,
 7301,
 7296,
 7293,
 7301,
 7318,
 7335,
 7340,
 7351,
 7362,
 7372,
 7380,
 7388,
 7393,
 7398,
 7404,
 7414,
 7423,
 7432,
 7443,
 7453,
 7466,
 7483,
 7512,
 7532,
 7554,
 7568,
 7584,
 7601,
 7623,
 7642,
 7683,
 7700,
 7720,
 7731,
 7752,
 7776,
 7805,
 7842,
 7871,
 7906,
 7920,
 7938,
 7981,
 8002,
 8019,
 8042,
 8049,
 8077,
 8086,
 8142,
 8161,
 8200,
 8225,
 8261,
 8291,
 8327,
 8337,
 8430,
 8469,
 8512,
 8557,
 8580,
 8627,
 8725,
 8750,
 8799,
 8858,
 8922,
 8980,
 9046,
 9195,
 9288,
 9379,
 9484,
 9577,
 9682,
 9743,
 9892,
 9992,
 10103,
 10244,
 10391,
 10538,
 10702,
 10929,
 11049,
 11345,
 11580,
 11849,
 11998,
 12212,
 12499,
 12703,
 12944,
 13133,
 13293,
 13424,
 13555,
 13849,
 14071,
 14255]
In [52]:
# Visualizing the complete series
print("Recovered cases time series:")
finland_recov_6
Recovered cases time series:
Out[52]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 10,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 300,
 1700,
 1700,
 1700,
 1700,
 2000,
 2000,
 2000,
 2000,
 2500,
 2500,
 2500,
 2500,
 2800,
 2800,
 3000,
 3000,
 3000,
 3000,
 3500,
 3500,
 3500,
 3500,
 4000,
 4000,
 4000,
 4000,
 4300,
 4300,
 4300,
 5000,
 5000,
 5000,
 5000,
 5000,
 4800,
 4800,
 4800,
 4800,
 4800,
 5100,
 5100,
 5100,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5500,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 5800,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6200,
 6400,
 6400,
 6600,
 6600,
 6600,
 6600,
 6600,
 6600,
 6600,
 6700,
 6700,
 6700,
 6700,
 6700,
 6700,
 6700,
 6800,
 6800,
 6800,
 6800,
 6800,
 6800,
 6800,
 6880,
 6880,
 6880,
 6880,
 6880,
 6880,
 6880,
 6920,
 6920,
 6920,
 6920,
 6920,
 6920,
 6920,
 6950,
 6950,
 6950,
 6950,
 6950,
 6950,
 6950,
 6980,
 6980,
 6980,
 6980,
 6980,
 6980,
 6980,
 7050,
 7050,
 7050,
 7050,
 7050,
 7050,
 7050,
 7100,
 7100,
 7100,
 7100,
 7100,
 7100,
 7100,
 7200,
 7200,
 7200,
 7200,
 7200,
 7200,
 7200,
 7350,
 7350,
 7350,
 7350,
 7350,
 7350,
 7350,
 7500,
 7500,
 7500,
 7500,
 7500,
 7500,
 7500,
 7700,
 7700,
 7700,
 7700,
 7700,
 7700,
 7700,
 7850,
 7850,
 7850,
 7850,
 7850,
 7850,
 7850,
 8100,
 8100,
 8100,
 8100,
 8100,
 8100,
 8100,
 8500,
 8500,
 8500,
 8500,
 8500,
 8500,
 8500,
 9100,
 9100,
 9100,
 9100,
 9100,
 9100,
 9100,
 9800,
 9800]
In [53]:
# Visualizing the complete series
print("Deceased cases time series:")
finland_deceas_6
Deceased cases time series:
Out[53]:
[0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 3,
 5,
 7,
 9,
 11,
 13,
 17,
 17,
 19,
 20,
 25,
 28,
 27,
 34,
 40,
 42,
 48,
 49,
 56,
 59,
 64,
 72,
 75,
 82,
 90,
 94,
 98,
 141,
 149,
 172,
 177,
 186,
 190,
 193,
 199,
 206,
 211,
 218,
 220,
 230,
 240,
 246,
 252,
 255,
 260,
 265,
 267,
 271,
 275,
 284,
 287,
 293,
 297,
 298,
 300,
 301,
 304,
 306,
 306,
 306,
 307,
 308,
 312,
 313,
 313,
 314,
 316,
 320,
 318,
 320,
 321,
 322,
 322,
 322,
 323,
 323,
 324,
 324,
 325,
 325,
 325,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 326,
 327,
 327,
 327,
 327,
 328,
 328,
 328,
 328,
 328,
 328,
 328,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 328,
 328,
 328,
 328,
 328,
 328,
 328,
 328,
 328,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 329,
 331,
 331,
 331,
 331,
 331,
 331,
 333,
 333,
 333,
 333,
 333,
 333,
 333,
 334,
 334,
 334,
 334,
 334,
 334,
 334,
 335,
 335,
 335,
 335,
 335,
 335,
 335,
 336,
 336,
 336,
 336,
 336,
 336,
 336,
 336,
 336,
 337,
 337,
 337,
 337,
 337,
 337,
 339,
 339,
 339,
 339,
 339,
 339,
 341,
 341,
 343,
 343,
 343,
 343,
 343,
 345,
 345,
 344,
 344,
 345,
 345,
 345,
 346,
 346,
 346,
 346,
 346,
 346,
 346,
 346,
 346,
 350,
 350,
 351,
 351,
 351,
 351,
 351,
 355,
 355]
In [54]:
# Calculating the active cases
finland_act_6 = list(np.array(finland_conf_6) - \
                     np.array(finland_recov_6) - \
                     np.array(finland_deceas_6))

finland_act_6
Out[54]:
[0,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 2,
 5,
 5,
 5,
 5,
 11,
 14,
 14,
 22,
 29,
 39,
 58,
 58,
 154,
 224,
 234,
 267,
 311,
 326,
 390,
 440,
 512,
 615,
 689,
 781,
 867,
 943,
 1024,
 1148,
 1219,
 1329,
 1391,
 1419,
 1199,
 1295,
 1557,
 1599,
 1849,
 1974,
 2147,
 2263,
 2421,
 2556,
 2618,
 2705,
 2797,
 2865,
 1594,
 1707,
 1891,
 1989,
 1770,
 1873,
 1980,
 2112,
 1718,
 1789,
 1886,
 2002,
 1741,
 1900,
 1784,
 1833,
 1956,
 2024,
 1587,
 1666,
 1821,
 1918,
 1478,
 1615,
 1695,
 1713,
 1428,
 1470,
 1558,
 935,
 989,
 1049,
 1080,
 1098,
 1339,
 1387,
 1431,
 1462,
 1472,
 1191,
 1216,
 1279,
 930,
 962,
 1010,
 1039,
 1067,
 1067,
 1090,
 789,
 819,
 842,
 858,
 878,
 901,
 916,
 539,
 548,
 562,
 578,
 582,
 586,
 591,
 593,
 607,
 616,
 617,
 417,
 428,
 240,
 245,
 263,
 270,
 270,
 281,
 286,
 208,
 213,
 213,
 219,
 224,
 228,
 233,
 136,
 144,
 150,
 162,
 165,
 166,
 172,
 88,
 85,
 93,
 110,
 127,
 132,
 143,
 114,
 124,
 131,
 139,
 144,
 149,
 155,
 135,
 144,
 153,
 164,
 174,
 187,
 202,
 201,
 221,
 243,
 257,
 273,
 288,
 310,
 259,
 300,
 317,
 337,
 348,
 368,
 392,
 371,
 408,
 437,
 472,
 486,
 503,
 546,
 467,
 484,
 507,
 514,
 542,
 550,
 606,
 475,
 514,
 539,
 575,
 605,
 641,
 651,
 593,
 632,
 675,
 720,
 743,
 790,
 886,
 711,
 760,
 819,
 883,
 941,
 1005,
 1154,
 1095,
 1186,
 1291,
 1384,
 1489,
 1548,
 1697,
 1548,
 1659,
 1799,
 1946,
 2093,
 2256,
 2483,
 2203,
 2499,
 2734,
 3003,
 3152,
 3366,
 3653,
 3253,
 3494,
 3682,
 3842,
 3973,
 4104,
 4398,
 3916,
 4100]
In [55]:
# Creating a list of same lenght as days_fin containing the increment of
# the confirmed cases compared to the previous day (first derivate)
# This tells how quickly the confirmed cases are growing
finland_conf_incr_6 = calc_increments(finland_conf_6)

# Visualizing the all series
print("Daily increment in confirmed cases time series:")
finland_conf_incr_6
Daily increment in confirmed cases time series:
Out[55]:
[0.0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 3,
 0,
 0,
 0,
 6,
 3,
 0,
 8,
 7,
 10,
 19,
 0,
 96,
 70,
 19,
 33,
 44,
 15,
 64,
 50,
 73,
 103,
 74,
 92,
 88,
 78,
 83,
 126,
 73,
 112,
 66,
 28,
 72,
 97,
 267,
 45,
 249,
 132,
 179,
 118,
 164,
 136,
 69,
 90,
 97,
 76,
 132,
 120,
 192,
 102,
 85,
 146,
 115,
 155,
 111,
 80,
 101,
 119,
 45,
 166,
 89,
 56,
 125,
 78,
 73,
 85,
 161,
 100,
 65,
 142,
 82,
 22,
 19,
 51,
 91,
 83,
 58,
 61,
 33,
 19,
 44,
 50,
 44,
 31,
 11,
 20,
 29,
 64,
 51,
 33,
 50,
 33,
 26,
 2,
 24,
 0,
 30,
 23,
 17,
 20,
 24,
 15,
 24,
 9,
 14,
 17,
 4,
 4,
 5,
 2,
 14,
 9,
 1,
 1,
 11,
 12,
 5,
 19,
 7,
 0,
 11,
 5,
 22,
 5,
 1,
 6,
 5,
 4,
 5,
 3,
 8,
 6,
 12,
 3,
 1,
 6,
 -5,
 -3,
 8,
 17,
 17,
 5,
 11,
 11,
 10,
 8,
 8,
 5,
 5,
 6,
 10,
 9,
 9,
 11,
 10,
 13,
 17,
 29,
 20,
 22,
 14,
 16,
 17,
 22,
 19,
 41,
 17,
 20,
 11,
 21,
 24,
 29,
 37,
 29,
 35,
 14,
 18,
 43,
 21,
 17,
 23,
 7,
 28,
 9,
 56,
 19,
 39,
 25,
 36,
 30,
 36,
 10,
 93,
 39,
 43,
 45,
 23,
 47,
 98,
 25,
 49,
 59,
 64,
 58,
 66,
 149,
 93,
 91,
 105,
 93,
 105,
 61,
 149,
 100,
 111,
 141,
 147,
 147,
 164,
 227,
 120,
 296,
 235,
 269,
 149,
 214,
 287,
 204,
 241,
 189,
 160,
 131,
 131,
 294,
 222,
 184]
In [56]:
# Calculating the incremental values of the deceased cases
finland_deceas_incr_6 = calc_increments(finland_deceas_6)
In [57]:
# Extracting all data about Finland (from the first available day)
Finland_0 = extract_country("Finland", "Not applicable", 0)
# Extracting the confirmed cases
finland_conf_0 = Finland_0[0]
# Extracting the recovered cases
finland_recov_0 = Finland_0[1]
# Extracting the decased cases
finland_deceas_0 = Finland_0[2]

# Calculating the incremental values of the confirmed cases
finland_conf_incr_0 = calc_increments(finland_conf_0)
# Calculating the incremental values of the deceased cases
finland_deceas_incr_0 = calc_increments(finland_deceas_0)

# Extracting the dataseries from the first confirmed case in the Country
# by using the function extract_non_null
# (the function extracts all non null values, not only the leading zeros
# but this is OK since the total confirmed cases cannot decrease)

finland_conf_pos = extract_non_null(finland_conf_0)
In [58]:
# Using the function pop_perc to calculate the confirmed cumulative cases
# in percentage of the total population
finland_conf_0_perc = pop_perc(finland_conf_0, finland_pop)
# Doing the same for the deceased cases
finland_deceas_0_perc = pop_perc(finland_deceas_0, finland_pop)

5.4.5. Data from other Scandinavian Countries and Estonia

In [59]:
# Preparing data related to the other Scandinavian Countries and Estonia

# 1. Skipping the first 6 days of the time series

# Denmark
# Calling the function prep_country_data to extract data related to the Country
denmark_6 = prep_country_data("Denmark", denmark_pop, "Not applicable", 6)
# Cumulative confirmed cases
denmark_conf_6 = denmark_6[0]
# Cumulative recovered cases
denmark_recov_6 = denmark_6[1]
# Cumulative deceased cases
denmark_deceas_6 = denmark_6[2]
# Daily confirmed cases
denmark_conf_incr_6 = denmark_6[4]

# Norway
norway_6 = prep_country_data("Norway", norway_pop, "Not applicable", 6)
norway_conf_6 = norway_6[0]
norway_recov_6 = norway_6[1]
norway_deceas_6 = norway_6[2]
norway_conf_incr_6 = norway_6[4]

# Sweden
sweden_6 = prep_country_data("Sweden", sweden_pop, "Not applicable", 6)
sweden_conf_6 = sweden_6[0]
sweden_recov_6 = sweden_6[1]
sweden_deceas_6 = sweden_6[2]
sweden_conf_incr_6 = sweden_6[4]

# Iceland
iceland_6 = prep_country_data("Iceland", iceland_pop, "Not applicable", 6)
iceland_conf_6 = iceland_6[0]
iceland_recov_6 = iceland_6[1]
iceland_deceas_6 = iceland_6[2]
iceland_conf_incr_6 = iceland_6[4]

# Estonia
estonia_6 = prep_country_data("Estonia", estonia_pop, "Not applicable", 6)
estonia_conf_6 = estonia_6[0]
estonia_recov_6 = estonia_6[1]
estonia_deceas_6 = estonia_6[2]
estonia_conf_incr_6 = estonia_6[4]

# 2. complete time series

# Denmark
# Calling the function prep_country_data to extract data related to the Country
denmark_0 = prep_country_data("Denmark", denmark_pop, "Not applicable", 0)
# Cumulative confirmed cases
denmark_conf_0 = denmark_0[0]
# Cumulative recovered cases
denmark_recov_0 = denmark_0[1]
# Cumulative deceased cases
denmark_deceas_0 = denmark_0[2]
# Cumulative active cases
denmark_act_0 = denmark_0[3]
# Daily confirmed cases
denmark_conf_incr_0 = denmark_0[4]
# Daily deceased cases
denmark_deceas_incr_0 = denmark_0[5]
# Cumulative confirmed cases starting from the day of the first positive case
denmark_conf_pos = denmark_0[6]
# Cumulative confirmed cases per capita
denmark_conf_0_perc = denmark_0[7]
# Cumulative deceased cases per capita
denmark_deceas_0_perc = denmark_0[8]

# Norway
norway_0 = prep_country_data("Norway", norway_pop, "Not applicable", 0)
norway_conf_0 = norway_0[0]
norway_recov_0 = norway_0[1]
norway_deceas_0 = norway_0[2]
norway_act_0 = norway_0[3]
norway_conf_incr_0 = norway_0[4]
norway_deceas_incr_0 = norway_0[5]
norway_conf_pos = norway_0[6]
norway_conf_0_perc = norway_0[7]
norway_deceas_0_perc = norway_0[8]

# Sweden
sweden_0 = prep_country_data("Sweden", sweden_pop, "Not applicable", 0)
sweden_conf_0 = sweden_0[0]
sweden_recov_0 = sweden_0[1]
sweden_deceas_0 = sweden_0[2]
sweden_act_0 = sweden_0[3]
sweden_conf_incr_0 = sweden_0[4]
sweden_deceas_incr_0 = sweden_0[5]
sweden_conf_pos = sweden_0[6]
sweden_conf_0_perc = sweden_0[7]
sweden_deceas_0_perc = sweden_0[8]

# Iceland
iceland_0 = prep_country_data("Iceland", iceland_pop, "Not applicable", 0)
iceland_conf_0 = iceland_0[0]
iceland_recov_0 = iceland_0[1]
iceland_deceas_0 = iceland_0[2]
iceland_act_0 = iceland_0[3]
iceland_conf_incr_0 = iceland_0[4]
iceland_deceas_incr_0 = iceland_0[5]
iceland_conf_pos = iceland_0[6]
iceland_conf_0_perc = iceland_0[7]
iceland_deceas_0_perc = iceland_0[8]

# Estonia
estonia_0 = prep_country_data("Estonia", estonia_pop, "Not applicable", 0)
estonia_conf_0 = estonia_0[0]
estonia_recov_0 = estonia_0[1]
estonia_deceas_0 = estonia_0[2]
estonia_act_0 = estonia_0[3]
estonia_conf_incr_0 = estonia_0[4]
estonia_deceas_incr_0 = estonia_0[5]
estonia_conf_pos = estonia_0[6]
estonia_conf_0_perc = estonia_0[7]
estonia_deceas_0_perc = estonia_0[8]

5.4.6. Data from other European Countries

In [60]:
# Calling the function prep_country_data to extract data related to Italy
italy_0 = prep_country_data("Italy", italy_pop, "Not applicable", 0)
# Cumulative confirmed cases
italy_conf_0 = italy_0[0]
# Cumulative recovered cases
italy_recov_0 = italy_0[1]
# Cumulative deceased cases
italy_deceas_0 = italy_0[2]
# Cumulative active cases
italy_act_0 = italy_0[3]
# Daily confirmed cases
italy_conf_incr_0 = italy_0[4]
# Daily deceased cases
italy_deceas_incr_0 = italy_0[5]
# Cumulative confirmed cases starting from the day of the first positive case
italy_conf_pos = italy_0[6]
# Cumulative confirmed cases per capita
italy_conf_0_perc = italy_0[7]
# Cumulative deceased cases per capita
italy_deceas_0_perc = italy_0[8]
In [61]:
# Preparing data related to Spain
spain_0 = prep_country_data("Spain", spain_pop, "Not applicable", 0)
spain_conf_0 = spain_0[0]
spain_recov_0 = spain_0[1]
spain_deceas_0 = spain_0[2]
spain_act_0 = spain_0[3]
spain_conf_incr_0 = spain_0[4]
spain_deceas_incr_0 = spain_0[5]
spain_conf_pos = spain_0[6]
spain_conf_0_perc = spain_0[7]
spain_deceas_0_perc = spain_0[8]
In [62]:
# Preparing data related to Germany
germany_0 = prep_country_data("Germany", germany_pop, "Not applicable", 0)
germany_conf_0 = germany_0[0]
germany_recov_0 = germany_0[1]
germany_deceas_0 = germany_0[2]
germany_act_0 = germany_0[3]
germany_conf_incr_0 = germany_0[4]
germany_deceas_incr_0 = germany_0[5]
germany_conf_pos = germany_0[6]
germany_conf_0_perc = germany_0[7]
germany_deceas_0_perc = germany_0[8]
In [63]:
# Preparing data related to France
france_0 = prep_country_data("France", france_pop, "Not applicable", 0)
france_conf_0 = france_0[0]
france_recov_0 = france_0[1]
france_deceas_0 = france_0[2]
france_act_0 = france_0[3]
france_conf_incr_0 = france_0[4]
france_deceas_incr_0 = france_0[5]
france_conf_pos = france_0[6]
france_conf_0_perc = france_0[7]
france_deceas_0_perc = france_0[8]
In [64]:
# Preparing data related to Switzerland
switzerland_0 = prep_country_data("Switzerland", switzerland_pop, "Not applicable", 0)
switzerland_conf_0 = switzerland_0[0]
switzerland_recov_0 = switzerland_0[1]
switzerland_deceas_0 = switzerland_0[2]
switzerland_act_0 = switzerland_0[3]
switzerland_conf_incr_0 = switzerland_0[4]
switzerland_deceas_incr_0 = switzerland_0[5]
switzerland_conf_pos = switzerland_0[6]
switzerland_conf_0_perc = switzerland_0[7]
switzerland_deceas_0_perc = switzerland_0[8]
In [65]:
# Preparing data related to Netherlands
netherlands_0 = prep_country_data("Netherlands", netherlands_pop, "Not applicable", 0)
netherlands_conf_0 = netherlands_0[0]
netherlands_recov_0 = netherlands_0[1]
netherlands_deceas_0 = netherlands_0[2]
netherlands_act_0 = netherlands_0[3]
netherlands_conf_incr_0 = netherlands_0[4]
netherlands_deceas_incr_0 = netherlands_0[5]
netherlands_conf_pos = netherlands_0[6]
netherlands_conf_0_perc = netherlands_0[7]
netherlands_deceas_0_perc = netherlands_0[8]
In [66]:
# Preparing data related to Austria
austria_0 = prep_country_data("Austria", austria_pop, "Not applicable", 0)
austria_conf_0 = austria_0[0]
austria_recov_0 = austria_0[1]
austria_deceas_0 = austria_0[2]
austria_act_0 = austria_0[3]
austria_conf_incr_0 = austria_0[4]
austria_deceas_incr_0 = austria_0[5]
austria_conf_pos = austria_0[6]
austria_conf_0_perc = austria_0[7]
austria_deceas_0_perc = austria_0[8]
In [67]:
# Preparing data related to Belgium
belgium_0 = prep_country_data("Belgium", belgium_pop, "Not applicable", 0)
belgium_conf_0 = belgium_0[0]
belgium_recov_0 = belgium_0[1]
belgium_deceas_0 = belgium_0[2]
belgium_act_0 = belgium_0[3]
belgium_conf_incr_0 = belgium_0[4]
belgium_deceas_incr_0 = belgium_0[5]
belgium_conf_pos = belgium_0[6]
belgium_conf_0_perc = belgium_0[7]
belgium_deceas_0_perc = belgium_0[8]
In [68]:
# Preparing data related to Portugal
portugal_0 = prep_country_data("Portugal", portugal_pop, "Not applicable", 0)
portugal_conf_0 = portugal_0[0]
portugal_recov_0 = portugal_0[1]
portugal_deceas_0 = portugal_0[2]
portugal_act_0 = portugal_0[3]
portugal_conf_incr_0 = portugal_0[4]
portugal_deceas_incr_0 = portugal_0[5]
portugal_conf_pos = portugal_0[6]
portugal_conf_0_perc = portugal_0[7]
portugal_deceas_0_perc = portugal_0[8]
In [69]:
# Preparing data related to Luxembourg
luxembourg_0 = prep_country_data("Luxembourg", luxembourg_pop, "Not applicable", 0)
luxembourg_conf_0 = luxembourg_0[0]
luxembourg_recov_0 = luxembourg_0[1]
luxembourg_deceas_0 = luxembourg_0[2]
luxembourg_act_0 = luxembourg_0[3]
luxembourg_conf_incr_0 = luxembourg_0[4]
luxembourg_deceas_incr_0 = luxembourg_0[5]
luxembourg_conf_pos = luxembourg_0[6]
luxembourg_conf_0_perc = luxembourg_0[7]
luxembourg_deceas_0_perc = luxembourg_0[8]
In [70]:
# Preparing data related to Poland
poland_0 = prep_country_data("Poland", poland_pop, "Not applicable", 0)
poland_conf_0 = poland_0[0]
poland_recov_0 = poland_0[1]
poland_deceas_0 = poland_0[2]
poland_act_0 = poland_0[3]
poland_conf_incr_0 = poland_0[4]
poland_deceas_incr_0 = poland_0[5]
poland_conf_pos = poland_0[6]
poland_conf_0_perc = poland_0[7]
poland_deceas_0_perc = poland_0[8]
In [71]:
# Preparing data related to Ireland
ireland_0 = prep_country_data("Ireland", ireland_pop, "Not applicable", 0)
ireland_conf_0 = ireland_0[0]
ireland_recov_0 = ireland_0[1]
ireland_deceas_0 = ireland_0[2]
ireland_act_0 = ireland_0[3]
ireland_conf_incr_0 = ireland_0[4]
ireland_deceas_incr_0 = ireland_0[5]
ireland_conf_pos = ireland_0[6]
ireland_conf_0_perc = ireland_0[7]
ireland_deceas_0_perc = ireland_0[8]

5.4.7. Data from UK and US

In [72]:
# Preparing data related to UK
uk_0 = prep_country_data("United Kingdom", uk_pop, "Not applicable", 0)
uk_conf_0 = uk_0[0]
uk_recov_0 = uk_0[1]
uk_deceas_0 = uk_0[2]
uk_act_0 = uk_0[3]
uk_conf_incr_0 = uk_0[4]
uk_deceas_incr_0 = uk_0[5]
uk_conf_pos = uk_0[6]
uk_conf_0_perc = uk_0[7]
uk_deceas_0_perc = uk_0[8]
In [73]:
# Preparing data related to US
us_0 = prep_country_data("US", us_pop, "Not applicable", 0)
us_conf_0 = us_0[0]
us_recov_0 = us_0[1]
us_deceas_0 = us_0[2]
us_act_0 = us_0[3]
us_conf_incr_0 = us_0[4]
us_deceas_incr_0 = us_0[5]
us_conf_pos = us_0[6]
us_conf_0_perc = us_0[7]
us_deceas_0_perc = us_0[8]

5.4.8. Data from Brazil, Russia and India

In [74]:
# Preparing data related to Brazil
brazil_0 = prep_country_data("Brazil", brazil_pop, "Not applicable", 0)
brazil_conf_0 = brazil_0[0]
brazil_recov_0 = brazil_0[1]
brazil_deceas_0 = brazil_0[2]
brazil_act_0 = brazil_0[3]
brazil_conf_incr_0 = brazil_0[4]
brazil_deceas_incr_0 = brazil_0[5]
brazil_conf_pos = brazil_0[6]
brazil_conf_0_perc = brazil_0[7]
brazil_deceas_0_perc = brazil_0[8]
In [75]:
# Preparing data related to Russia
russia_0 = prep_country_data("Russia", russia_pop, "Not applicable", 0)
russia_conf_0 = russia_0[0]
russia_recov_0 = russia_0[1]
russia_deceas_0 = russia_0[2]
russia_act_0 = russia_0[3]
russia_conf_incr_0 = russia_0[4]
russia_deceas_incr_0 = russia_0[5]
russia_conf_pos = russia_0[6]
russia_conf_0_perc = russia_0[7]
russia_deceas_0_perc = russia_0[8]
In [76]:
# Preparing data related to India
india_0 = prep_country_data("India", india_pop, "Not applicable", 0)
india_conf_0 = india_0[0]
india_recov_0 = india_0[1]
india_deceas_0 = india_0[2]
india_act_0 = india_0[3]
india_conf_incr_0 = india_0[4]
india_deceas_incr_0 = india_0[5]
india_conf_pos = india_0[6]
india_conf_0_perc = india_0[7]
india_deceas_0_perc = india_0[8]

5.4.9. Data from China

In [77]:
# Daily Report from China broken by Provinces
daily_rep_short[daily_rep_short['Country_Region'] == "China"]
Out[77]:
Province_State Country_Region Confirmed Deaths Recovered Active Incidence_Rate Case-Fatality_Ratio
98 Anhui China 991 6 985 0 1.56705 0.605449
99 Beijing China 938 9 928 1 4.35469 0.959488
100 Chongqing China 589 6 579 4 1.89877 1.01868
101 Fujian China 427 1 409 17 1.08348 0.234192
102 Gansu China 170 2 168 0 0.644672 1.17647
103 Guangdong China 1895 8 1851 36 1.67019 0.422164
104 Guangxi China 260 2 258 0 0.527812 0.769231
105 Guizhou China 147 2 145 0 0.408333 1.36054
106 Hainan China 171 6 165 0 1.83084 3.50877
107 Hebei China 368 6 359 3 0.48703 1.63043
108 Heilongjiang China 948 13 935 0 2.51259 1.37131
109 Henan China 1283 22 1257 4 1.33576 1.71473
110 Hong Kong China 5280 105 5019 156 70.4283 1.98864
111 Hubei China 68139 4512 63627 0 115.158 6.62176
112 Hunan China 1019 4 1015 0 1.47703 0.392542
113 Inner Mongolia China 275 1 266 8 1.08524 0.363636
114 Jiangsu China 670 0 665 5 0.832195 0
115 Jiangxi China 935 1 934 0 2.01162 0.106952
116 Jilin China 157 2 155 0 0.580621 1.27389
117 Liaoning China 280 2 271 7 0.642349 0.714286
118 Macau China 46 0 46 0 7.08409 0
119 Ningxia China 75 0 75 0 1.09012 0
120 Qinghai China 18 0 18 0 0.298507 0
121 Shaanxi China 438 3 414 21 1.13354 0.684932
122 Shandong China 845 7 827 11 0.841047 0.828402
123 Shanghai China 1114 7 1028 79 4.59571 0.628366
124 Shanxi China 209 0 204 5 0.56213 0
125 Sichuan China 733 3 701 29 0.878792 0.409277
126 Tianjin China 256 3 242 11 1.64103 1.17188
127 Tibet China 1 0 1 0 0.0290698 0
128 Xinjiang China 902 3 899 0 3.62686 0.332594
129 Yunnan China 211 2 204 5 0.436853 0.947867
130 Zhejiang China 1283 1 1279 3 2.23636 0.0779423
In [78]:
print("Number of entries related to China:")
len(daily_rep_short[daily_rep_short['Country_Region'] == "China"])
Number of entries related to China:
Out[78]:
33
In [79]:
# Extracting data related to Hubei province by screning out the text variables
# and putting the result in list format
hubei_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                (world_conf_short['Province/State'] == 'Hubei')]
hubei_conf_0 = hubei_conf_0.iloc[:, 2:].values.tolist()[0]
hubei_conf_incr_0 = calc_increments(hubei_conf_0)
hubei_conf_0_perc = pop_perc(hubei_conf_0, hubei_pop)

hubei_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                  (world_recov_short['Province/State'] == 'Hubei')]
hubei_recov_0 = hubei_recov_0.iloc[:, 2:].values.tolist()[0]

hubei_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                    (world_deceas_short['Province/State'] == 'Hubei')]
hubei_deceas_0 = hubei_deceas_0.iloc[:, 2:].values.tolist()[0]

hubei_act_0 = list(np.array(hubei_conf_0) - \
                   np.array(hubei_recov_0) - \
                   np.array(hubei_deceas_0))

# Extracting data related to all the other provinces, making the sum
# and putting the result in list format
restchina_conf_0 = world_conf_short[(world_conf_short['Country/Region'] == 'China') & \
                                    (world_conf_short['Province/State'] !=  'Hubei')]
restchina_conf_0 = restchina_conf_0.groupby(['Country/Region']).sum()
restchina_conf_0 = restchina_conf_0.values.tolist()[0]
restchina_conf_incr_0 = calc_increments(restchina_conf_0)
restchina_conf_0_perc = pop_perc(restchina_conf_0, restchina_pop)

restchina_recov_0 = world_recov_short[(world_recov_short['Country/Region'] == 'China') & \
                                      (world_recov_short['Province/State'] !=  'Hubei')]
restchina_recov_0 = restchina_recov_0.groupby(['Country/Region']).sum()
restchina_recov_0 = restchina_recov_0.values.tolist()[0]

restchina_deceas_0 = world_deceas_short[(world_deceas_short['Country/Region'] == 'China') & \
                                        (world_deceas_short['Province/State'] !=  'Hubei')]
restchina_deceas_0 = restchina_deceas_0.groupby(['Country/Region']).sum()
restchina_deceas_0 = restchina_deceas_0.values.tolist()[0]

restchina_act_0 = list(np.array(restchina_conf_0) - \
                       np.array(restchina_recov_0) - \
                       np.array(restchina_deceas_0))

5.5. Summary of the Created Datasets

Within this document, different datasets are used for different purposes. This section provides a summary as a useful reference and describes the naming rules that have been used. Those variables that have been created temporarily just for reason of code clarity are not included in this list.

world_conf_clean

  • Dataframe based on world_confirmed (ime_series_covid19_confirmed_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_recov_clean

  • Dataframe based on world_recovered (ime_series_covid19_recovered_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

world_deceas_clean

  • Dataframe based on world_deceased (ime_series_covid19_deaths_global.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"

daily_rep_clean

  • Dataframe based on daily_report (mm-dd-yyyy.csv)
  • The NaN cells in the Province/State columns have been changed into strings with value "Not applicable"



world_conf_short, world_recov_short, world_deceas_short

  • Dataframe based on world_conf_clean, world_recov_clean, world_deceas_clean
  • GPS coordinates have been dropped

world_conf, world_recov, world_deceas

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Only columns with daily data have been selected

world_conf_tot, world_recov_tot, world_deceas_tot

  • Dataframe based on world_conf, world_recov, world_deceas
  • The overall worldwide daily sum has been calculated

world_act_tot

  • List based on world_conf_x, world_recov_x, world_deceas_x (the second and third are subtracted from the first) containing the active cases

world_conf_incr

  • Dataframe based on world_conf_tot containing the daily increments

world_deceas_incr

  • Dataframe based on world_deceas_tot containing the daily increments

daily_rep_short

  • Dataframe based on daily_rep_clean
  • All columns not containing cases counts have been dropped



world_conf_group, world_recov_group, world_deceas_group

  • Dataframe based on world_conf_short, world_recov_short, world_deceas_short
  • Data grouped by Country/Region

daily_rep_group

  • Dataframe based on daily_rep_short
  • Data grouped by Country/Region



days_tot

  • List obtained by using world_confirmed which contains the dates of all the days in m/d format

days_fin

  • List based on days_tot where the first 6 days have been dropped



country_conf_x, country_recov_x, country_deceas_x

  • where country is the Country written with small letters
  • where x is the number of days to skip in the time series starting from the first one
  • Lists obtained by using world_confirmed, world_recovered, world_deceased
  • Data related to Country has been extracted
  • Data related to the first x days has been dropped

country_act_x

  • List based on country_conf_x, country_recov_x, country_deceas_x (the second and third are subtracted from the first) containing the active cases

country_conf_incr_x

  • List based on country_conf_x containing the daily increments

country_deceas_incr_x

  • List based on country_deceas_x containing the daily increments

country_conf_0_perc

  • List based on country_conf_0
  • It containing the confirmed cumulative cases in percentage of the total population

country_deceas_0_perc

  • List based on country_deceas_0
  • It containing the deceased cumulative cases in percentage of the total population

country_conf_pos

  • where country is the Country written with small letters
  • Lists based on country_conf_0
  • Data related to days with zero cumulative cases in the Country has been dropped

6. Domain-Specific Concepts

The basic reproductive number, R0 is the average number of secondary infections generated by one infectious individual. When R0 > 1 the infection is able to spread. The aim of the non-pharmaceutical interventions (NPIs), as social distancing, is to reduce the value of R0.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-transmissibility-25-01-2020.pdf

The Case Fatality Ratio (CFR) is the proportion of detected cases of a given disease that die as a result of it.

Surveillance is typically biased towards detecting clinically severe cases, particularly at the start of an epidemic when diagnostic capacity is limited. This leads to an over estimation of the CFR.

On the other hand, there is a time interval (2/3 weeks) between the onset of symptoms and death or recovery. Therefore, measuring the simple ratio deceased/infected during a growing epidemic does not allow to observe the outcome of all the infected cases, leading to a under estimation of the CFR.

https://www.imperial.ac.uk/media/imperial-college/medicine/sph/ide/gida-fellowships/Imperial-College-COVID19-severity-10-02-2020.pdf

NOTE: The Infection Fatality Rate is the percentage of people that get the infection and then die. This number is much harder to estimate compared to the CFR since we do not know the total amount of people that have been really infected in a certain area.

7. Data Visualization

7.1. Overview

7.1.1. General Comments to the Plots

The following curves are shown in the plots contained in this section:

  • Cumulative confirmed cases
  • Cumulative recovered cases
  • Cumulative deceased cases
  • Cumulative active cases
  • Daily increments in the confirmed cases
  • Daily increments in the recovered cases
  • Daily increments in the deceased cases
  • Daily increments in the active cases

The first four curves show the cumulative cases in a certain region since the start of the epidemic.

The cumulative confirmed cases curve is expected to grow exponentially and then slowly smoothing out towards a horizontal shape. Government decisions and people behavior can affect the way this curve looks like. The aim is to keep the curve not too steep in order not to saturate the capacity of the hospitals in the Country. However, it should be noted that the effects of Government and people actions are not immediate due to the incubation period.

The cumulative recovered cases curve follows the cumulative confirmed cases with a certain delay in time and a lower y value due to the amount of deceased cases.

The cumulative active cases are given by the confirmed cases minus the recovered cases minus the deceased cases. It is the only one of the cumulative cases curves that can decrease over time and this happens when the number of confirmed cases grows slower than the combined number of recovered and deceased cases. This curve is expected to have an (upside down) bell shape.

The new confirmed daily cases show the speed at which the virus is spreading. This curve is expected to have an (upside down) bell shape. This curve shows the daily values and therefore is shows also some noise. Some of this noise might be due to mistakes in reporting the daily data (sometimes data of a certain day is reported together with the next day data). This kind of mistake does not affect the grand total and affects only very little the trend of the curves.

The new recovered daily cases curve looks similar to the new confirmed daily cases curve with a delay in time and lower y values.

The incremental daily active cases curve shows two picks of opposite sign. The x value where the negative curve starts corresponds to the pick of the corresponding cumulative curve.

NOTE: The number of the actual confirmed cases is very likely above the number of the counted confirmed cases since not all population is tested and there might be many infected persons showing no symptoms. However, by assuming a constant testing policy during the all observation period, the rate of changes is unaffected by systematic under-reporting and therefore there is a lot of useful information that can be obtained by those curves.


"The only real data we have is from the flights used by a number of Countries to repatriate their citizens. The all population was tested on those planes. If the population samples given by the passengers of those flights would be representative of the all population, we could conclude that the epidemic is at least 3 times larger compared to what the collected data shows."

Feb 12th, Prof. Neil Ferguson, https://www.imperial.ac.uk/people/neil.ferguson


"By comparing the number of flights that came into a certain Country from the worst affected area in China (Wuhan City) with the cases detected in that Country, it can be bound that the number of cases per flight varies quite a lot depending on the Country.

Singapore had a relatively high number of cases compared to other Countries. By using that data as a benchmark, that is, by assuming the Singapore has detected all the cases, the result is that worldwide approximately 2/3 of the cases have not been detected."

Professor Christl Donnelly, https://www.imperial.ac.uk/people/c.donnelly


More recent serological tests show that the number of actual cases might be up to 10/20 times the number of counted confirmed cases.

7.1.2. A Reference Curve Set

The first complete curves are related to China. Let's analyze the curves related to China either than Hubei province. The curves can be divided in 4 phases which are named here after the shape of the cumulative confirmed cases curve.

1) Exponential increase phase

  • In the first phase the number of cumulative confirmed cases grows exponentially (it grows and it grows faster each day) while the number of recovered and deceased cases is still null (the number of cumulative confirmed cases corresponds to the number of active cases, which corresponds to the number of the "Infectious" in the popular epidemiologic SIR model (')). The increments in the number of confirmed cases shows the left side of a bell shape. The same happens for the incremental active cases.

2) Linear increase phase

  • In the second phase the number of confirmed cases grows at a quite constant speed (the cumulative cases grow in a straight line and the increment curve starts to flatten). In the middle of this phase we see the pick in the number of incremental confirmed cases (R0 has decreased to 1). There is also a pick on the incremental active cases. In this phase we see a quite modest increase in recovered and deceased cases and we start to see that the cumulative active cases curve and the cumulative confirmed cases curve take their own path.

3) Slowed-down increase phase

  • In the third phase the number of confirmed cases grows at a slower and slower speed (the cumulative curve starts to flatten towards a horizontal shape and the incremental confirmed cases curve shows the right side of the bell). R0 starts to decrease below 1. In this phase the number of cumulative recovered and deceased cases keeps growing and the number of cumulative active cases reaches a pick and then starts to decrease. The pick on the cumulative active cases is known as Herd Immunity. In the incremental active cases curve this is seen as the point when the curve changes sign. There is a lag in time between the pick in new confirmed cases seen in phase 2 and the pick in active cases seen in this phase.

4) No increase phase

  • In the fourth phase the number of cumulative confirmed cases remains constant and consequently the corresponding incremental curve is zero (R0 is almost 0). The number of recovered and deceased cases keeps growing and the active cases decrease down towards zero.

Note that a new wave might follow (as it might happen in China outside Hubei).

Note that should the testing policy change during the observation period, the curve might look different.

Whenever containment measures have been adopted in a certain area, the earliest moment in time when it makes sense to start to release them gradually is after the Herd Immunity pick. However, in this case the Herd Immunity has been obtained under certain conditions (the containment measures) and therefore, as soon as those conditions are released, the Heard Immunity is no longer valid. Release of containment measures might cause the curves to differ from this example and might lead to new picks before the active cases curves goes to zero.

(') https://medium.com/data-for-science/epidemic-modeling-101-or-why-your-covid19-exponential-fits-are-wrong-97aa50c55f8

In [80]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [81]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               first_line_x='1/30', first_line_col=7, first_line_ls=':',
               first_line_x_l='End of the exponential increase phase',
               second_line_x='2/5', second_line_col=7, second_line_ls='--',
               second_line_x_l='End of the linear increase phase',
               third_line_x='2/22', third_line_col=7, third_line_ls='-.',
               third_line_x_l='End of the slowed-down increase phase',
               fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
               fourth_line_x_l='End of the no increase phase',
               fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
               fifth_line_x_l='Herd Immunity')
In [82]:
# Plotting new daily deceased cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_deceas_0), 3, 
               "Daily deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) deceased cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [83]:
# Plotting new daily recovered cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_recov_0), 2, 
               "Daily recovered cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily (reported) recovered cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')
In [84]:
# Plotting daily increments in the active cases in the rest of China
cust_bar_plot((days_tot, calc_increments(restchina_act_0), 1, 
               "Daily increments in the active cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 increments in the daily active cases "\
                    "in China either than Hubei",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_x='1/30', first_line_col=7, first_line_ls=':',
              first_line_x_l='End of the exponential increase phase',
              second_line_x='2/5', second_line_col=7, second_line_ls='--',
              second_line_x_l='End of the linear increase phase',
              third_line_x='2/22', third_line_col=7, third_line_ls='-.',
              third_line_x_l='End of the slowed-down increase phase',
              fourth_line_x='3/13', fourth_line_col=7, fourth_line_ls='-',
              fourth_line_x_l='End of the no increase phase',
              fifth_line_x='2/11', fifth_line_col=6, fifth_line_ls='--',
              fifth_line_x_l='Herd Immunity')

7.2. Finnish Internal Situation

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not publish reliable daily data about the recovered cases and therefore it is not possible to draw an accurate curve for the active cases.

Notes:

The increased speed in the confirmed cases on 4/4 is due to change in testing policy.

The confirmed cases data from 3/12 has been reported on 3/13.

There is something wrong in the source data since for certain days the cumulative value is smaller than the previous day (see below).

In [85]:
print("Error data in confirmed cases in Finland:")
find_error_days(finland_conf_0)
Error data in confirmed cases in Finland:
['7/15', '7/16']
In [86]:
print("Error data in deceased cases in Finland:")
find_error_days(finland_deceas_0)
Error data in deceased cases in Finland:
['4/6', '6/1', '7/15', '9/30']
In [87]:
# Plotting daily cumulative cases in Finland
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "confirmed cases"),
               #(days_fin, finland_recov_6, ".", '-', 2, "recovered cases"),
               (days_fin, finland_deceas_6, ".", '-', 3, "deceased cases"),
               #(days_fin, finland_act_6, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Finland over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               vis_xticks=7,
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [88]:
print("Concrete actions by the Finnish government:")
measures.style.set_properties(**{'text-align': 'left'}).\
set_table_styles([ dict(selector='th', props=[('text-align', 'left')] ) ]).hide_index()
Concrete actions by the Finnish government:
Out[88]:
Date Actions
12.3. First containment measures: gathering of more than 500 people banned
16.3. State of emergency declared: closing shools, universities, museums, theatres, libraries, sport facilities; gathering of more than 10 people banned
28.3. Additional containment measures: Uusimaa region borders closed, restaurant dining forbidden
11.4. Additional containment measures: No passengers in ships from Germany, Sweden, Estonia
15.4. First releasing measures: Uusima border re-opened
14.5. More releasing misures: schools opening, business travell allowed within Schengen
1.6. Further releasing: gathering up to 50 people allowed, reopening of bars and restaurants, reopening of museums and theatres
15.6. End of state of emergency
In [89]:
# Plotting new daily confirmed Coronavirus cases in Finland
cust_bar_plot((days_fin, finland_conf_incr_6, 0, "New daily confirmed cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=2,
               first_line_x='3/12', first_line_col=6,
               first_line_ls=':', first_line_x_l='First actions',
               second_line_x='3/16', second_line_col=6,
               second_line_ls='--', second_line_x_l='State of emergency declared',
               third_line_x='3/28', third_line_col=6,
               third_line_ls='-.', third_line_x_l='Additional actions',
               fourth_line_x='4/11', fourth_line_col=6,
               fourth_line_ls='-', fourth_line_x_l='Tighter border control',
               fifth_line_x='4/15', fifth_line_col=8,
               fifth_line_ls='-.', fifth_line_x_l='Uusima border opened',
               sixth_line_x='5/14', sixth_line_col=8,
               sixth_line_ls='--', sixth_line_x_l='More releasing misures',
               seventh_line_x='6/1', seventh_line_col=8,
               seventh_line_ls=':', seventh_line_x_l='Further releasing',
               eighth_line_x='6/15', eighth_line_col=8,
               eighth_line_ls='-', eighth_line_x_l='End of state of emergency')
In [90]:
# Plotting new daily deceased cases in Finland
cust_bar_plot((days_fin, finland_deceas_incr_6, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Finland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.3. Comparison with the Closest Neighboring Countries

Sweden and Russia have a much higher number of cumulative confirmed cases per capita compared to Finland. Also, they have a much higher number of new daily confirmed cases.

The number of confirmed cases per capita considering the all world is similar to Finland but it is on an increasing path whereas Finnish curve tends to remain constant.

Also, Sweden has currently a much higher number of cumulative deaths per capita compared to Finland. Therefore, at least for Sweden, it is unlikely that the comparison is biased by a different testing policy.

In [91]:
# Comparing Finnish per capita cumulative confirmed cases with Sweden and Russia
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_conf_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_conf_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_conf_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative confirmed cases "\
                     "in Finland compared to the closest neighboring Countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [92]:
# Comparing Finnish per capita cumulative deceased cases with Sweden and Russia
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 1, "Sweden"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, russia_deceas_0_perc, ".", '-', 3, "Russia"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               (days_tot, world_deceas_perc, ".", '-', 4, "World Total"),
               figsize_w=18, figsize_h=12,
               title="COVID-19 per capita cumulative deceased cases "\
                     "in Finland compared to the closest neighboring Countries",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [93]:
# Comparing Finnish per capita daily confirmed cases with the closest neighboring Countries

finland_conf_incr_0_perc = pop_perc(finland_conf_incr_0, finland_pop)
sweden_conf_incr_0_perc = pop_perc(sweden_conf_incr_0, sweden_pop)
norway_conf_incr_0_perc = pop_perc(norway_conf_incr_0, norway_pop)
russia_conf_incr_0_perc = pop_perc(russia_conf_incr_0, russia_pop)
estonia_conf_incr_0_perc = pop_perc(estonia_conf_incr_0, estonia_pop)

plot_stacked_bar(days_tot,
                 [finland_conf_incr_0_perc, sweden_conf_incr_0_perc, norway_conf_incr_0_perc,
                  russia_conf_incr_0_perc, estonia_conf_incr_0_perc],
                 ["Finland", "Sweden", "Norway", "Russia", "Estonia"],
                 col=[0, 1, 6, 3, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 per capita daily confirmed cases in "\
                       "Finland compared to the closest neighboring Countries",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)

7.3.1. Comparison with Other Scandinavian Countries and Estonia

Description of the plots of this section

It appears that the Finnish curve is quite smooth compared to the other curves. Only Iceland and Estonia have a smoother curve. This would suggest that the virus is not spreading faster in Finland compared to most of the other Scandinavian Countries. By shifting all the curves so that they start for each Country in the day of the first confirmed case, the Finnish curve is the slowest to grow but then crosses the curves of Iceland and Estonia.

Even though the virus started later in Finland, the first recovered case happened much earlier than other Scandinavian Countries.

Finland has the lowest number of deceased cases after Norway and Estonia (Sweden has the highest).

The high numbers for Sweden do not surprise due to the quite relaxed containment policy in the Country.

NOTE: It should be noted that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from Denmark does not include Faroe Islands and Greenland.

In [94]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases in "\
                     "Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [95]:
# Comparing cumulative confirmed cases over time in Scandinavia plus Estonia
# starting form the day of the first confirmed case in Finland
cust_line_plot((list(range(len(finland_conf_pos))), finland_conf_pos,
                ".", '-', 0, "Finland"),
               (list(range(len(denmark_conf_pos))), denmark_conf_pos,
                ".", '-', 3, "Denmark"),
               (list(range(len(norway_conf_pos))), norway_conf_pos,
                ".", '-', 6, "Norway"),
               (list(range(len(sweden_conf_pos))), sweden_conf_pos,
                ".", '-', 8, "Sweden"),
               (list(range(len(iceland_conf_pos))), iceland_conf_pos,
                ".", '-', 4, "Iceland"),
               (list(range(len(estonia_conf_pos))), estonia_conf_pos,
                ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Scandinavia and Estonia over time",
               title_fs=18, title_offset=20,
               rem_borders=True, 
               label_fs=12, tick_fs=10, 
               x_label="Days since the first confirmed case in the Country",
               vis_xticks=1,
               rot=0,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [96]:
# Comparing new daily confirmed Coronavirus cases in Scandinavia plus Estonia
cust_line_plot((days_fin, finland_conf_incr_6, ".", '-', 0, "Finland"),
               (days_fin, denmark_conf_incr_6, ".", '-', 3, "Denmark"),
               (days_fin, norway_conf_incr_6, ".", '-', 6, "Norway"),
               (days_fin, sweden_conf_incr_6, ".", '-', 8, "Sweden"),
               (days_fin, iceland_conf_incr_6, ".", '-', 4, "Iceland"),
               (days_fin, estonia_conf_incr_6, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Scandinavia and Estonia",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

Comments to the next two plots:

Data related to Iceland is corrupted (cumulative data cannot decrease) so the related plot is not shown.

In [97]:
# Comparing cumulative recovered cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_recov_6, denmark_recov_6, norway_recov_6, sweden_recov_6, estonia_recov_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) recovered cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [98]:
# Comparing cumulative deceased cases over time in Scandinavia plus Estonia
plot_stacked_bar(days_fin,
                 [finland_deceas_6, denmark_deceas_6, norway_deceas_6, sweden_deceas_6, estonia_deceas_6],
                 ["Finland", "Denmark", "Norway", "Sweden", "Estonia"],
                 col=[0, 3, 6, 8, 7],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="Coronavirus COVID-19 cumulative (reported) deceased cases in "\
                       "Scandinavia and Estonia over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="Total of cases in all the Countries",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)

7.4. Comparison with other European Countries

Finland has also the lowest curves compared to other European Countries (except for Luxemburg). However, it shall be noted that those are absolute values which are not normalized by taking into consideration the Country population.

The plots related to the new confirmed cases show the same pattern for all those Countries (except for Poland). This might be due to the fact that those plots are very much dependent on how many people are tested in a certain day.

Switzerland has managed to keep a relatively low curve. France has experienced a noticeable increase in the recorded confirmed cases around 4/11. Germany has managed to keep a low curve of the deceased cases despite the relatively high curve of the confirmed cases.

NOTE: When comparing those curves please note also that the testing policy in each Country affects considerably the way the curve looks like. The less people you test, the better the curve looks like.

NOTE: The data from France and Netherlands does not include offshore territories.

NOTE: Obviously, the following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [99]:
print("Error data in confirmed cases in Spain:")
find_error_days(spain_conf_0)
print("Error data in confirmed cases in France:")
find_error_days(france_conf_0)
print("Error data in confirmed cases in Portugal:")
find_error_days(portugal_conf_0)
print("Error data in deceased cases in Spain:")
find_error_days(spain_deceas_0)
print("Error data in deceased cases in France:")
find_error_days(france_deceas_0)
Error data in confirmed cases in Spain:
['4/24', '5/25']
Error data in confirmed cases in France:
['4/18', '4/22', '4/29', '5/13', '5/16', '5/24', '5/26', '6/2', '6/24', '6/25', '6/27', '6/28', '7/4', '7/5', '7/8', '7/11', '7/12', '7/14', '7/18', '7/19', '7/25', '7/29', '8/1', '8/2', '8/3', '8/5', '8/8', '8/9', '8/11', '8/13', '8/16', '8/18', '8/19', '8/20', '8/29', '9/5', '9/6', '10/17']
Error data in confirmed cases in Portugal:
['5/2']
Error data in deceased cases in Spain:
['5/25', '8/12']
Error data in deceased cases in France:
['5/16', '5/19', '6/27', '7/8', '7/11', '7/14', '7/18', '7/21', '7/29', '8/3', '8/5', '8/8', '8/10', '8/18', '8/19', '9/5', '9/6']
In [100]:
# Comparing cumulative confirmed cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_conf_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_conf_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_conf_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_conf_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium, Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [101]:
# Comparing cumulative deceased cases over time for Finland,
# Italy, Spain, Germany, France, Switzerland, Belgium, Netherlands and Portugal
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0, ".", '-', 3, "France"),              
               (days_tot, switzerland_deceas_0, ".", '-', 6, "Switzerland"),
               (days_tot, belgium_deceas_0, ".", '-', 8, "Belgium"),
               (days_tot, netherlands_deceas_0, ".", '-', 7, "Netherlands"),
               (days_tot, portugal_deceas_0, ".", '-', 5, "Portugal"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nItaly, Spain, Germany, France "\
                     "Switzerland, Belgium and Netherlands and Portugal",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [102]:
# Comparing cumulative confirmed cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_conf_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_conf_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_conf_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_conf_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_conf_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [103]:
# Comparing cumulative deceased cases over time for Finland,
# Austria, Luxembourg, Ireland and Poland
cust_line_plot((days_tot, finland_deceas_0, ".", '-', 0, "Finland"),               
               (days_tot, austria_deceas_0, ".", '-', 3, "Austria"),
               (days_tot, luxembourg_deceas_0, ".", '-', 9, "Luxembourg"),
               (days_tot, ireland_deceas_0, ".", '-', 1, "Ireland"),
               (days_tot, poland_deceas_0, ".", '-', 6, "Poland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland compared to \nAustria, "\
                     "Luxembourg, Ireland and Poland",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.5. Situation in China

Two sets of plots are shown here: one for the Hubei province where the infection has started and the other one for the rest of China.

The first plot of the two sets shows the cumulative confirmed cases broken down by deceased, recovered and active cases. Whereas in Hubei there has not been yet a second wave, that is the case for the rest of China.

The second plot shows separately the cumulative curves for the confirmed, recovered, deceased and active cases. The curve for the rest of China has been analyzed in details in section 7.1.2.

Note: There is something wrong in the source data for Hubei province on 4/17 since the cumulative recovered cases cannot decrease over time. Also, the incremental data (increment in confirmed cases) for Hubei province from 2/12 has been reported on 2/13.

In [104]:
print("Error data in confirmed cases in Hubei:")
find_error_days(hubei_recov_0)
Error data in confirmed cases in Hubei:
['4/17']
In [105]:
# Plotting daily cumulative cases in Hubei
plot_stacked_bar(days_tot,
                 [hubei_deceas_0, hubei_recov_0, hubei_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Hubei (China) over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [106]:
# Plotting daily cumulative cases in Hubei
cust_line_plot((days_tot, hubei_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, hubei_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, hubei_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, hubei_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Hubei (China) over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [107]:
# Plotting daily increments in confirmed cases in Hubei province in China
cust_bar_plot((days_tot, hubei_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in Hubei province (China)",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [108]:
# Plotting daily cumulative cases in the rest of China
plot_stacked_bar(days_tot,
                 [restchina_deceas_0, restchina_recov_0, restchina_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in China either than Hubei over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [109]:
# Plotting daily cumulative cases in the rest of China
cust_line_plot((days_tot, restchina_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, restchina_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, restchina_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, restchina_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in China "\
                     "either than Hubei over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [110]:
# Plotting daily increments in confirmed cases in the rest of China
cust_bar_plot((days_tot, restchina_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in China "\
                     "either than Hubei",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.6. Situation in Italy

Italy has been the first Country after China (and the first Country in Europe) that has been hit hard from the virus and its government, as opposite to Finland, keeps very comprehensive public data. Therefore, analysis of Italian curves might be useful also to have some hints about Finnish situation.

Italy has had a very confusing strategy and decision-making process in the beginning of the epidemic and this has been probably one of the causes of the high number of cases. However, after an initial period of very poor handling of the situation, quite strict containment measures have been decided and this has led to curves whose shape that are quite close to the curves from China with the main difference that the slowed down phase has been smoother. So, there has been a exponential increase in the number of confirmed cases, followed by a short linear phase and a quite long slowed down phase, which is still ongoing.

Note:

  • Data from 3/12 has been reported on 3/13.
  • The confirmed cases on 6/19 are wrong since the incremental value cannot be negative.
  • The data about daily deceased cases on 6/24 is wrong since the incremental value cannot be negative.
In [111]:
# Plotting daily cumulative cases in Italy
cust_line_plot((days_tot, italy_conf_0, ".", '-', 0, "confirmed cases"),
               (days_tot, italy_recov_0, ".", '-', 2, "recovered cases"),
               (days_tot, italy_deceas_0, ".", '-', 3, "deceased cases"),
               (days_tot, italy_act_0, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in Italy over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [112]:
# Plotting daily cumulative cases in Italy
plot_stacked_bar(days_tot,
                 [italy_deceas_0, italy_recov_0, italy_act_0],
                 ["deceased cases", "recovered cases", "active cases"],
                 col=[3, 2, 1],
                 multidim=True, figsize_w=18, figsize_h=12,
                 title="COVID-19 cumulative cases in Italy over time",
                 title_fs=18,
                 frame=False,
                 category_labels=days_tot,
                 label_fs = 12, ticks_fs=10, 
                 x_label="month/day", rot=90,
                 y_label="confirmed cases",
                 legend=True, legend_loc = 2, legend_fs=12,
                 add_text=None, addtext_x=0, addtext_y=0, addtext_fs=10)
In [113]:
# Plotting new daily confirmed Coronavirus cases in Italy
cust_bar_plot((days_tot, italy_conf_incr_0, 0, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [114]:
# Plotting increments in the active cases in Italy
cust_bar_plot((days_tot, calc_increments(italy_act_0), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [115]:
# Plotting new daily deceased cases in Italy
cust_bar_plot((days_tot, italy_deceas_incr_0, 3, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily deceased cases in Italy",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.7. UK and US

UK and US have followed quite relaxed policies in containing the spread of the virus during the first days.

NOTE: The data from UK does not include the Isle of Man and the Channel Islands.

NOTE: The following data is wrong since the cumulative data cannot decrease (leading to a negative daily increment):

In [116]:
print("Error data in confirmed cases in UK:")
find_error_days(uk_conf_0)
Error data in confirmed cases in UK:
[]
In [117]:
# Comparing cumulative confirmed Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [118]:
# Comparing cumulative deceased Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_deceas_0, ".", '-', 0, "Finland"),
               (days_tot, uk_deceas_0, ".", '-', 4, "UK"),
               (days_tot, us_deceas_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [119]:
# Comparing new daily confirmed cases Coronavirus cases in UK and US
cust_line_plot(#(days_tot, finland_conf_incr_0, ".", '-', 0, "Finland"),
               (days_tot, uk_conf_incr_0, ".", '-', 4, "UK"),
               (days_tot, us_conf_incr_0, ".", '-', 3, "US"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 new daily confirmed cases "\
                     "in UK and US",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.8. Brazil, Russia and India

Whereas in the first half of the year the virus has hit mostly China, Europe and the US, by the end of winter the number of active cases in China was down to very low values and the same has happened in most of Europe by the end of spring.

However, in other parts of the world, like Russia, India and Brazil the curves are still in a growing phase at the beginning of summer.

The following chart shows the cumulative confirmed cases in those 3 Countries. For reason of scale, Finnish curve would not bi visible in the same chart so Italy curve has been added to show a comparison with the number of cases over time in one of the most hit Countries in Europe.

The second chart shows a similar plot for the deceased cases (which are less sensible to the Country specific testing policy).

While Brazil and Russia have entered the linear growing phase, India is still in the exponential growing phase.

In [120]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_conf_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0, ".", '-', 1, "India"),
               (days_tot, italy_conf_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [121]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy
cust_line_plot((days_tot, brazil_deceas_0, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.9. Normalizing by Country population

7.9.1. List of Variables Affecting Potentially the Curves

The curves related to the cumulative confirmed cases seem to have similar shape. The main difference seems to be the height.

The height of those curves can differ for different reasons, including:

  • the Country overall population (obviously the more people are in the Country, the more people can get infected)
  • the population density (higher is the population density, easier it might be for the virus to spread)
  • demographics (older is the population, easier is for the virus to kill)
  • average health conditions of the population (healthier is the population, harder is for the virus to kill)
  • genetics ?
  • climate (the virus might have more difficulty to survive in too cold or too hot weather)
  • pollution (there are preliminary indications that pollution might facilitate the spread of the virus)
  • possible mutations of the virus in that area
  • the testing policy in the Country (the more people a Country tests, the more infected cases might be discovered)
  • which containment measures have been taken by Authorities and how early
  • how well the population has followed the containment measures
  • whether the Country is in a central area and whether there is a lot of movement of people
  • last but not least, the stage in which the Country is (the curves follow all the same smooth-steep-smooth shape so Countries where the virus has just started to spread show lower curves)

It might be interesting to isolate the first variable, Country population, by dividing the values by the Country population in order to calculate the amount of cases per capita. The result is shown in plots in this section.

NOTE: The Country population figures are approximative.

7.9.2. Confirmed Cases: Summary of Findings from the Analysis

The plots show that the other variables still can affect the curve as much as 10 times..

When comparing Scandinavian Countries and Estonia, Finland has the lowest number of confirmed cases per capita. Iceland has the highest number.

Among the analyzed European Countries, Luxemburg has the highest confirmed cases curve, followed by Spain and Belgium (which have values that are comparable with Iceland). Poland is the only Country among those ones that have been analyzed, that has a confirmed cases curve lower than Finland.

Note that one of the reasons why UK and Finland curves started pretty low might be due to the fact that they are quite isolated geographically and therefore the virus started to spread later.

However, those curves clearly show that in Countries that have not taken prompt containment actions, such as UK, US and Sweden, those curves started to take a steeper shape.

In [122]:
# Comparing cumulative confirmed cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_conf_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_conf_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_conf_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, iceland_conf_0_perc, ".", '-', 4, "Iceland"),
               (days_tot, estonia_conf_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other Scandinavian Countries plus Estonia \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [123]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_conf_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_conf_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_conf_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_conf_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_conf_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_conf_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_conf_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_conf_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [124]:
# Comparing cumulative confirmed cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_conf_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_conf_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_conf_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_conf_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_conf_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_conf_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_conf_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [125]:
# Plotting cumulative confirmed cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_conf_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_conf_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_conf_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_conf_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative confirmed cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.9.3. Deceased Cases: Summary of Findings from the Analysis

In the attempt to eliminate the variability due to different testing policies in different Countries, similar plots have been created by taking the deceased cases rather than the cumulative confirmed cases as a reference curve.

Finland has the second lowest curve in Scandinavia, after Norway, and the third lowest if also Estonia is counted. Sweden has the higherst curve. (This is the same result that has been obtained before normalization).

Among the analyzed EU Countries, Belgium is the Country with the highest deceased cases curve, followed by Spain and Italy.

Poland has a deceased cases curve lower than Finland.

In [126]:
# Comparing cumulative deceased cases over time for Finland,
# and other Scandinavian Countries plus Estonia in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, denmark_deceas_0_perc, ".", '-', 3, "Denmark"),
               (days_tot, norway_deceas_0_perc, ".", '-', 6, "Norway"),
               (days_tot, sweden_deceas_0_perc, ".", '-', 8, "Sweden"),
               (days_tot, estonia_deceas_0_perc, ".", '-', 7, "Estonia"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other Scandinavian Countries plus Estonia\n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [127]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, italy_deceas_0_perc, ".", '-', 2, "Italy"),
               (days_tot, spain_deceas_0_perc, ".", '-', 1, "Spain"),
               (days_tot, germany_deceas_0_perc, ".", '-', 4, "Germany"),
               (days_tot, france_deceas_0_perc, ".", '-', 3, "France"),
               (days_tot, switzerland_deceas_0_perc, ".", '-', 6, "Switzerland"),
               (days_tot, luxembourg_deceas_0_perc, ".", '-', 9, "Luxembourg"),
               (days_tot, belgium_deceas_0_perc, ".", '-', 8, "Belgium"),
               (days_tot, ireland_deceas_0_perc, ".", '-', 7, "Ireland"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [128]:
# Comparing cumulative deceased cases over time for Finland,
# and other European Countries + UK & US in percentage of the Country population
cust_line_plot((days_tot, finland_deceas_0_perc, ".", '-', 0, "Finland"),
               (days_tot, netherlands_deceas_0_perc, ".", '-', 4, "Netherlands"),
               (days_tot, austria_deceas_0_perc, ".", '-', 3, "Austria"),
               (days_tot, portugal_deceas_0_perc, ".", '-', 2, "Portugal"),
               (days_tot, poland_deceas_0_perc, ".", '-', 7, "Poland"),
               (days_tot, uk_deceas_0_perc, ".", '-', 6, "UK"),
               (days_tot, us_deceas_0_perc, ".", '-', 5, "US"),               
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Finland and other European Countries + UK & US \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [129]:
# Plotting cumulative deceased cases over time for Brazil, Russia and India
# compared to Italy in percentage of each Country population
cust_line_plot((days_tot, brazil_deceas_0_perc, ".", '-', 2, "Brazil"),
               (days_tot, russia_deceas_0_perc, ".", '-', 0, "Russia"),
               (days_tot, india_deceas_0_perc, ".", '-', 1, "India"),
               (days_tot, italy_deceas_0_perc, ".", '-', 3, "Italy"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative deceased cases "\
                     "in Brazil, Russia and India \n"\
                     "in percentage of each Country population",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)

7.10. Demographic Considerations

In [130]:
print("The range of the median age in the Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(median_age_range), "\n")
print("The range of the median age in the Scandinavian Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(scand_median_age_range), "\n")
print("The range of the median age in the EU Countries that are analyzed here is: "\
      "{:.1f} years"\
     .format(eu_median_age_range))
The range of the median age in the Countries that are analyzed here is: 18.9 years 

The range of the median age in the Scandinavian Countries that are analyzed here is: 6.0 years 

The range of the median age in the EU Countries that are analyzed here is: 9.2 years

7.11. World View

By looking the all world, the virus is still in the linear growing phase and there is no sign of slowing down.

In [131]:
# Plotting daily cumulative cases in the all world
cust_line_plot((days_tot, world_conf_tot, ".", '-', 0, "confirmed cases"),
               (days_tot, world_recov_tot, ".", '-', 2, "recovered cases"),
               (days_tot, world_deceas_tot, ".", '-', 3, "deceased cases"),
               (days_tot, world_act_tot, ".", '-', 1, "active cases"),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 cumulative cases in the all world "\
                     "over time",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=True,
               leg_fs=12,
               legend_loc=0)
In [132]:
# Plotting new daily cases in the all world
cust_bar_plot((days_tot, world_conf_incr, 0, ""),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily confirmed cases in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)
In [133]:
# Plotting increments in the active cases in the all world
cust_bar_plot((days_tot, calc_increments(world_act_tot), 1, ""),
               figsize_w=18, figsize_h=12,
               title="Coronavirus COVID-19 increments in the active cases "\
                     "in the all world",
               title_fs=18, title_offset=20,
               rem_borders=True,
               label_fs=12, tick_fs=10, 
               x_label="month/day",
               rot=90,
               y_label=None,
               legend=False,
               leg_fs=12,
               legend_loc=0)

7.11.1. Lethality

The estimated average daily number of deaths due to other reasons has been added with the only scope of putting the numbers into context.

In this comparison the deaths by other reasons are estimated with a linear model, which is clearly an approximation since, for example, seasonal flu and suicides follows certain yearly patterns.

On 4/16 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths due to seasonal flu since the start of the year.

On 5/13 the number of reported deaths due to COVID-19 has overtaken the estimated number of deaths by suicide since the start of the year.

Currently, the number of deaths by COVID-19 grows somehow linearly at about 5000 deaths/day. Therefore, unless this growth will slow down, the number of estimated deaths due to other reasons (like for example road traffic accidents) might at a certain point become higher.

By assuming the number of COVID-19 reported deaths worldwide will stay constant at around 4000/day for the rest of the year, by the end of the year the number of deaths by COVID-19 worldwide would reach about 1.2 millions, whereas the number of deaths by seasonal flu is estimated to be around 470.000 (-/+ 38%). This means an overall COVID-19 mortality rate 2.6 times higher than seasonal flu and slightly lower than traffic road accidents (1.3 millions).

The following shall be noted:

  • The deaths by COVID-19 might be under estimated due to the fact that not all the population is tested
  • The average deaths by seasonal flu in year 2020 might be less than normal due to the high hand hygiene that has been introduced due to the novel Coronavirus. Similarly, the deaths due to road traffic accidents might be slight less than expected due to the reduced mobility of people due to containment measures
  • This comparison tells nothing about the IFR. In particular, it should be noted that, without the containment measures that have been adopted worldwide, the number of COVID-19 deaths would have been very likely considerably higher

Sources for the additional info:
- https://www.worldometers.info/
- https://www.who.int/mental_health/prevention/suicide/suicideprevent/en/
- https://www.who.int/mediacentre/events/meetings/2011/road_safety/en/
- https://www.who.int/news-room/fact-sheets/detail/tobacco

In [134]:
# Plotting new daily deceased cases in the all world
cust_bar_plot((days_tot, world_deceas_incr, 3, 
               "Daily (reported) deceased cases by COVID-19"),
              figsize_w=18, figsize_h=12,
              title="Coronavirus COVID-19 new daily deceased cases "\
                    "in the all world",
              title_fs=18, title_offset=20,
              rem_borders=True,
              label_fs=12, tick_fs=10, 
              x_label="month/day",
              rot=90,
              y_label=None,
              legend=True,
              leg_fs=12,
              legend_loc=0,
              first_line_y=1288,
              first_line_y_l="Average daily estimated deaths by seasonal flu",
              second_line_y=2192,
              second_line_y_l="Average daily estimated deaths by suicides",
              third_line_y=3561,
              third_line_y_l="Average daily estimated number of deaths "\
                             "by road traffic accidents",
              #fourth_line_y=19178,
              #fourth_line_y_l="Average daily estimated deaths by direct tobacco smoking"
             )
In [135]:
# Creating a series containing the number of deaths by different causes
# so far this year
deceas_causes = pd.Series([world_deceas_tot.iloc[-1],
                           1288*(len(days_tot)+21),
                           2192*(len(days_tot)+21),
                           3561*(len(days_tot)+21),
                           19178*(len(days_tot)+21)],
                          index=["Reported deaths by COVID-19",
                                 "Estimated deaths by seasonal flu",
                                 "Estimated deaths by suicides",
                                 "Estimated deaths by road traffic accidents",
                                 "Estimated deaths by direct tobacco smoking"])
In [136]:
# Showing the number of deaths by different causes so far this year in a bar plot
plot_cust_hbar(deceas_causes.sort_values(),
               figsize_w=8, figsize_h=6,
               frame=False, grid=False,
               ref_font_size=12,
               title_text="Number of deaths by different causes so far this year "\
                          "compared to COVID-19",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)

8. Statistics

8.1. World View

In [137]:
# Reordering the columns
daily_rep_group = daily_rep_group.reindex(columns=['Confirmed',
                                                   'Recovered',
                                                   'Deaths',
                                                   'Active'])
In [138]:
print("Grand Total Worldwide:\n")
print(daily_rep_group.sum().to_string())
# Confirmed cases in percentage of the total population
cont_perc_world = daily_rep_group.sum()[0]/(7.8*1000000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f} %".format(cont_perc_world))
Grand Total Worldwide:

Confirmed    41695675
Recovered    28345817
Deaths        1137193
Active       12212665

Confirmed cases in percentage of the total population:
0.53 %
In [139]:
# Mortality (worldwide)
mort = (daily_rep_group.sum()[2]/daily_rep_group.sum()[0])*100
print("'Calculated' mortality worldwide (Case Fatality Rate): {:.2f} %\n".format(mort))
print("IMPORTANT NOTE:\nThe actual mortality (Infection Fatality Rate) could be much lower",
      "due to the fact that\nnot all infected people have been tested!\n"
      "On the other hand, the counted deaths are due to infections that happened",
      "weeks ago.\nThis means that, as long as the contagius cases increase, "
      "the calculated mortality\nis under-estimated.")
'Calculated' mortality worldwide (Case Fatality Rate): 2.73 %

IMPORTANT NOTE:
The actual mortality (Infection Fatality Rate) could be much lower due to the fact that
not all infected people have been tested!
On the other hand, the counted deaths are due to infections that happened weeks ago.
This means that, as long as the contagius cases increase, the calculated mortality
is under-estimated.

8.2. Top Ten Countries

In [140]:
# The top 10 Countries by number of confirmed cases in descending order
conf_top_10 = daily_rep_group.sort_values(by ='Confirmed', ascending = False).\
              head(10)['Confirmed']
In [141]:
# Showing the top 10 Countries by number of confirmed cases in a bar plot
plot_cust_hbar(conf_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of confirmed cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=0,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [142]:
# The top 10 Countries by number of recovered cases in descending order
recov_top_10 = daily_rep_group.sort_values(by ='Recovered', ascending = False).\
               head(10)['Recovered']
In [143]:
# Showing the top 10 Countries by number of recovered cases in a bar plot
plot_cust_hbar(recov_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of recovered cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=2,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [144]:
# The top 10 Countries by number of deceased cases in descending order
deceas_top_10 = daily_rep_group.sort_values(by ='Deaths', ascending = False).\
                head(10)['Deaths']
In [145]:
# Showing the top 10 Countries by number of deceased cases in a bar plot
plot_cust_hbar(deceas_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of deceased cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=3,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=True,
               visible_digits=2)
In [146]:
# The top 10 Countries by number of active cases in descending order
act_top_10 = daily_rep_group.sort_values(by ='Active', ascending = False).\
             head(10)['Active']
In [147]:
# Showing the top 10 Countries by number of active cases in a bar plot
plot_cust_hbar(act_top_10.sort_values(),
               figsize_w=16, figsize_h=12,
               frame=False, grid=False,
               ref_font_size=14,
               title_text="Countries by number of active cases "\
                          "in descending order (top 10)",
               title_offset=20,
               color_numb=1,
               categ_labels=True,
               labels=None,
               rot=0,
               show_values=True,
               omitted_value=0,
               percent=False,
               center_al=False,
               visible_digits=2)
In [148]:
print("\n(*) Note that for certain Countries the figures in the previous three tables",
      "contain also off shore territories.")
print("For example, for France the numbers include:\n\n",
      "- French Polynesia\n",
      "- New caledonia\n",
      "- St Martina\n",
      "- Saint Barthelemyia\n",
      "- French Guiana\n",
      "- Guadelupe\n",
      "- Mayotte\n",
      "- Reunion\n")
(*) Note that for certain Countries the figures in the previous three tables contain also off shore territories.
For example, for France the numbers include:

 - French Polynesia
 - New caledonia
 - St Martina
 - Saint Barthelemyia
 - French Guiana
 - Guadelupe
 - Mayotte
 - Reunion

8.3. Finland

In [149]:
# Visualizing the current status in Finland
print("Latest situation in Finland:\n")
print(daily_rep_group.loc['Finland'].to_string())
# Confirmed cases in percentage of the total population
cont_perc_fin = daily_rep_group.loc['Finland'][0]/(5.513*1000000)*100
print("\nConfirmed cases in percentage of the total population:")
print("{:.2f} %".format(cont_perc_fin))
Latest situation in Finland:

Confirmed    14255
Recovered     9800
Deaths         355
Active        4100

Confirmed cases in percentage of the total population:
0.26 %
In [150]:
# Mortality (Finland)
mort_fin = (daily_rep_group.loc['Finland'][2]/daily_rep_group.loc['Finland'][0])*100
print("'Calculated' mortality in Finland (Case Fatality Rate): {:.2f} %\n".format(mort_fin))
print("IMPORTANT NOTE:\nThe actual mortality (Infection Fatality Rate) could be much lower",
      "due to the fact that\nnot all infected people have been tested!")
'Calculated' mortality in Finland (Case Fatality Rate): 2.49 %

IMPORTANT NOTE:
The actual mortality (Infection Fatality Rate) could be much lower due to the fact that
not all infected people have been tested!

9. Conclusions

Currently, the number of known infected COVID-19 cases is about 0.53% of the world population and has produced already more deaths than seasonal flu worldwide. After an exponential growing phase, currently the number of new confirmed cases is growing linearly.

Even though the virus originated from China, it has spread west to Europe and then further west to US and South America.

The first wave has been over in China around the end of winter and in most of Europe around the end of spring.

Currently, most of the confirmed cases are in US, followed by India, Brazil and Russia. In China a second wave has followed and a third wave started at the beginning of the summer.

Unfortunately, the Finnish Institute for Health and Welfare (THL) does not seem to keep a complete public API for uploading daily the time series. In particular, there is no reliable estimate of the number of recovered cases and therefore it is not possible to get a reliable curve for the active cases, which is actually the most important curve to follow the evolution of the epidemic.

The confirmed reported cases are about 0.26% of the Finnish population. The Finnish curve of the confirmed cases is currently in a slowing down growing phase with signs for a possible second wave. (Such a second wave is visible in other European Countries like Spain and France as well as Germany, Belgium and Netherlands.) The confirmed cases curve in Finland is the lowest pro capita in Scandinavia and is one of the lowest in Europe (even though two of the closest neighbors, Sweden and Russia, have confirmed cases pro capita curves that are clearly above the world average). The same applies to the cumulative deceased cases, suggesting that the low curve in the cumulative confirmed cases might not be due to a too relaxed testing policy. The case fatality rate is numerically slightly lower than the world average.

Even though the actual percentage of people that have been in contact with the virus is certainly higher, it should be noted that such low numbers suggest that the immunity in Finland is at very low levels currently (there is still a quite high percentage of susceptible people).

There might be different reasons for the relative difficulty of the virus to spread in Finland, including remote geographical location, low population density, low level of pollution, culture and local practices (as keeping physical distances when greeting, spending a lot of time outdoor in Nature and vising sauna frequently) and prompt containment actions.

In a further study it would be interesting to verify those assumptions scientifically.

10. Acknowledgements

Many thanks to Johns Hokpins University for sharing and maintaining daily the source csv files.

Many thanks to Coursera for providing a very informative course.

Many thanks to colleagues and friends who have contributed by providing links and comments.


In [151]:
print("Last plotted day:", dt.datetime.strptime(last_day, "%m-%d-%Y").\
      date().strftime("%d-%b-%Y"))
end_time = dt.datetime.utcnow()
script_duration = end_time - start_time
print("\nRunning time for the full script (hh:mm:ss):", script_duration)
Last plotted day: 22-Oct-2020

Running time for the full script (hh:mm:ss): 0:02:30.991934

Used software:
- Jupyter Notebook server 6.0.1
- Python 3.6.8
- numpy 1.18.2
- pandas 1.0.3
- matplotlib 3.1.2
- seaborn 0.9.0
- regex 2019.8.19
on top of Linux Ubuntu 18.04